Hi,

In my (limited) experience, it can really make a difference for floating-point 
rasters. Testing with a small one I have on hand (10980x10980x1, Float32), I 
get:

 - GeoTIFF DEFLATE 280 MB
 - Zarr BLOSC zlib NONE 281 MB
 - Zarr BLOSC zlib BIT 253 MB
 - Zarr BLOSC zlib BYTE 249 MB

Laurentiu

On Fri, Dec 8, 2023, at 19:19, Even Rouault via gdal-dev wrote:
> Jesse,
> 
> This would break interoperability with other TIFF readers... Even adding a 
> new TIFF tag to advertize that bit shuffling is applied would probably not be 
> a sufficient guard, as existing readers wouldn't read it, and would just 
> display garbage, which is worth that not being able to open the file at all. 
> The only way I can think off of doing that in a safe way would be to use new 
> values for the Compression tag, which isn't pretty either.
> 
> You should probably try Zarr which has such capability with the Blosc codec. 
> Cf https://gdal.org/drivers/raster/zarr.html : BLOSC_SHUFFLE
> 
> I'm curious however to know which typical compression gain you get with that.
> 
> Even
> 
> 
> 
> Le 08/12/2023 à 18:06, Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND 
> APPLICATIONS INC] via gdal-dev a écrit :
>> Hi,
>>  
>> When using horizonal differencing to reduce the numerical range of band 
>> data, the upper bytes in the produced stream are typically 0 which leverages 
>> LZ’s byte based compression model.  But the least significant bytes can 
>> still have many significant bits as 0. Unless the whole byte is replicated, 
>> LZ compressors can’t do much to leverage the pattern however.  For data with 
>> temporal and or spatial coherence, ‘shuffling’ is another effective strategy 
>> to losslessly reform the data stream to be favorable to LZ style 
>> compressors.  And plays nicely off gains already provided by the PREDICTOR 
>> functionality.
>>  
>> The notion is to arrange the bit stream where the Nth “shuffled” byte 
>> contains the Nth bit from each byte in the sequence.  The sequence length is 
>> usually determined by the data type bit length.
>>  
>> For example (for brevity, assume bytes are 4 bits long)
>>  
>> Byte 1,  Byte 2, Byte 3, Byte 4
>> 0001, 0011, 0111, 0001
>>  
>> They all share the top 0 bit and the bottom 1 bit,
>>  
>> “Shuffled”
>> 0000, 0010, 0110, 1111
>>  
>> The algorithm is pretty simple to implement, and can be SIMD accelerated for 
>> high performance.
>>  
>> While we specifically are users of the GTIFF format, such a strategy could 
>> be employed generically for most raster and even vector formats.
>>  
>> Best,
>> Jesse
>> 
>> _______________________________________________
>> gdal-dev mailing list
>> gdal-dev@lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>> 
> -- 
> http://www.spatialys.com
> My software is free, but my time generally not.
> _______________________________________________
> gdal-dev mailing list
> gdal-dev@lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev
> 
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev
  • ... Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] via gdal-dev
    • ... Even Rouault via gdal-dev
      • ... Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] via gdal-dev
        • ... Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] via gdal-dev
          • ... Rahkonen Jukka via gdal-dev
      • ... Laurențiu Nicola via gdal-dev
        • ... Even Rouault via gdal-dev
          • ... Laurențiu Nicola via gdal-dev
            • ... Even Rouault via gdal-dev
              • ... Laurențiu Nicola via gdal-dev
                • ... Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] via gdal-dev
                • ... Even Rouault via gdal-dev
        • ... Howard Butler via gdal-dev

Reply via email to