Hi, In my (limited) experience, it can really make a difference for floating-point rasters. Testing with a small one I have on hand (10980x10980x1, Float32), I get:
- GeoTIFF DEFLATE 280 MB - Zarr BLOSC zlib NONE 281 MB - Zarr BLOSC zlib BIT 253 MB - Zarr BLOSC zlib BYTE 249 MB Laurentiu On Fri, Dec 8, 2023, at 19:19, Even Rouault via gdal-dev wrote: > Jesse, > > This would break interoperability with other TIFF readers... Even adding a > new TIFF tag to advertize that bit shuffling is applied would probably not be > a sufficient guard, as existing readers wouldn't read it, and would just > display garbage, which is worth that not being able to open the file at all. > The only way I can think off of doing that in a safe way would be to use new > values for the Compression tag, which isn't pretty either. > > You should probably try Zarr which has such capability with the Blosc codec. > Cf https://gdal.org/drivers/raster/zarr.html : BLOSC_SHUFFLE > > I'm curious however to know which typical compression gain you get with that. > > Even > > > > Le 08/12/2023 à 18:06, Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND > APPLICATIONS INC] via gdal-dev a écrit : >> Hi, >> >> When using horizonal differencing to reduce the numerical range of band >> data, the upper bytes in the produced stream are typically 0 which leverages >> LZ’s byte based compression model. But the least significant bytes can >> still have many significant bits as 0. Unless the whole byte is replicated, >> LZ compressors can’t do much to leverage the pattern however. For data with >> temporal and or spatial coherence, ‘shuffling’ is another effective strategy >> to losslessly reform the data stream to be favorable to LZ style >> compressors. And plays nicely off gains already provided by the PREDICTOR >> functionality. >> >> The notion is to arrange the bit stream where the Nth “shuffled” byte >> contains the Nth bit from each byte in the sequence. The sequence length is >> usually determined by the data type bit length. >> >> For example (for brevity, assume bytes are 4 bits long) >> >> Byte 1, Byte 2, Byte 3, Byte 4 >> 0001, 0011, 0111, 0001 >> >> They all share the top 0 bit and the bottom 1 bit, >> >> “Shuffled” >> 0000, 0010, 0110, 1111 >> >> The algorithm is pretty simple to implement, and can be SIMD accelerated for >> high performance. >> >> While we specifically are users of the GTIFF format, such a strategy could >> be employed generically for most raster and even vector formats. >> >> Best, >> Jesse >> >> _______________________________________________ >> gdal-dev mailing list >> gdal-dev@lists.osgeo.org >> https://lists.osgeo.org/mailman/listinfo/gdal-dev >> > -- > http://www.spatialys.com > My software is free, but my time generally not. > _______________________________________________ > gdal-dev mailing list > gdal-dev@lists.osgeo.org > https://lists.osgeo.org/mailman/listinfo/gdal-dev >
_______________________________________________ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev