Thanks for the suggestion Even, we’ll see how effective Zarr is for our 
datasets.

Jesse

From: Even Rouault <even.roua...@spatialys.com>
Date: Friday, December 8, 2023 at 12:20 PM
To: "Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC]" 
<jesse.r.me...@nasa.gov>, gdallists <gdal-dev@lists.osgeo.org>
Subject: [EXTERNAL] Re: [gdal-dev] GTiff bit shuffle compression feature request

CAUTION: This email originated from outside of NASA.  Please take care when 
clicking links or opening attachments.  Use the "Report Message" button to 
report suspicious messages to the NASA SOC.



Jesse,

This would break interoperability with other TIFF readers... Even adding a new 
TIFF tag to advertize that bit shuffling is applied would probably not be a 
sufficient guard, as existing readers wouldn't read it, and would just display 
garbage, which is worth that not being able to open the file at all. The only 
way I can think off of doing that in a safe way would be to use new values for 
the Compression tag, which isn't pretty either.

You should probably try Zarr which has such capability with the Blosc codec. Cf 
https://gdal.org/drivers/raster/zarr.html : BLOSC_SHUFFLE

I'm curious however to know which typical compression gain you get with that.

Even


Le 08/12/2023 à 18:06, Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND 
APPLICATIONS INC] via gdal-dev a écrit :
Hi,

When using horizonal differencing to reduce the numerical range of band data, 
the upper bytes in the produced stream are typically 0 which leverages LZ’s 
byte based compression model.  But the least significant bytes can still have 
many significant bits as 0. Unless the whole byte is replicated, LZ compressors 
can’t do much to leverage the pattern however.  For data with temporal and or 
spatial coherence, ‘shuffling’ is another effective strategy to losslessly 
reform the data stream to be favorable to LZ style compressors.  And plays 
nicely off gains already provided by the PREDICTOR functionality.

The notion is to arrange the bit stream where the Nth “shuffled” byte contains 
the Nth bit from each byte in the sequence.  The sequence length is usually 
determined by the data type bit length.

For example (for brevity, assume bytes are 4 bits long)

Byte 1,  Byte 2, Byte 3, Byte 4
0001, 0011, 0111, 0001

They all share the top 0 bit and the bottom 1 bit,

“Shuffled”
0000, 0010, 0110, 1111

The algorithm is pretty simple to implement, and can be SIMD accelerated for 
high performance.

While we specifically are users of the GTIFF format, such a strategy could be 
employed generically for most raster and even vector formats.

Best,
Jesse



_______________________________________________

gdal-dev mailing list

gdal-dev@lists.osgeo.org<mailto:gdal-dev@lists.osgeo.org>

https://lists.osgeo.org/mailman/listinfo/gdal-dev

--

http://www.spatialys.com<http://www.spatialys.com/>

My software is free, but my time generally not.
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev
  • ... Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] via gdal-dev
    • ... Even Rouault via gdal-dev
      • ... Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] via gdal-dev
        • ... Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] via gdal-dev
          • ... Rahkonen Jukka via gdal-dev
      • ... Laurențiu Nicola via gdal-dev
        • ... Even Rouault via gdal-dev
          • ... Laurențiu Nicola via gdal-dev
            • ... Even Rouault via gdal-dev
              • ... Laurențiu Nicola via gdal-dev
                • ... Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] via gdal-dev
                • ... Even Rouault via gdal-dev
        • ... Howard Butler via gdal-dev

Reply via email to