Leigh, Thank you for your report. We will be looking into the problem shortly.
Elena On Nov 9, 2010, at 4:23 PM, Leigh Orf wrote: > I am experimenting with the n-bit filter on floating point data to reduce > file size. I have been tweaking precision settings in order to reduce > precision. Mostly, things are working; however, I am a bit confused. > > In the documentation > (http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetNbit) I read the > following statement: > > "By nature, the N-Bit filter should not be used together with other I/O > filters" > > I also read the following in the documentation > (http://www.hdfgroup.org/HDF5/doc/UG/10_Datasets.html) in discussion integer > nbit compression: > > "After n-bit compression, none of these discarded bits, known as padding bits > will be stored on disk." > > While no such statement is found under the floating point discussion below > that, I would assume it also holds for floating point data. > > There is also this statement: > > "The n-bit decompression algorithm is very similar to n-bit compression. The > only difference is that at the byte level, compression packs out all padding > bits and stores only significant bits into a continous buffer (unsigned char) > while decompression unpacks significant bits and inserts padding bits (zeros) > at the proper positions to recover the data bytes as they existed before > compression." > > So, when I look at all these statements combined, I am led to believe that > just applying n-bit compression should give me a good reduction in file size. > However, that does not happen. > > Using the example C code found on > http://www.hdfgroup.org/HDF5/doc/UG/10_Datasets.html I modified it to fill a > 2D array of size 1000x1000 with a known function (something involving logs > and sins to get me variability roughly what I find in my real scientific data > sets). > > Below are four hdf5 files: > seagrape:/users/orf/test% ls -l *.h5 > > -rw-r--r-- 1 orf users 4004016 Nov 9 15:01 uncompressed-float.h5 > -rw-r--r-- 1 orf users 4004016 Nov 9 15:01 nbit.h5 > -rw-r--r-- 1 orf users 3398723 Nov 9 15:02 gzip-compressed-float.h5 > -rw-r--r-- 1 orf users 880108 Nov 9 15:02 nbit-gzip.h5 > > uncompressed-float.h5 is without any compression whatsoever. As expected, the > file is roughly 1000x1000x4 bytes in size. > nbit.h5 has the n-bit filter applied. It is the same size! > gzip-compressed-float.h5 is the floating data with gzip (level 6) applied, so > it's lossless. > nbit-gzip.h5 has the nbit filter followed by the gzip filter. Lots o' > compression!! > > So, it seems to me that the nbit filter applied to floating point data stores > the zeroed padding bits...? I thought I'd see the file reduction without > having to apply gzip compression? > > I have done several tests and I am sure that the nbit.h5 file has a loss of > precision when I subtract the lossless data from it. > > Some more info. Note, the dataset name is the same for all four files. > > seagrape:/users/orf/test% h5ls -lrv uncompressed-float.h5 > Opened "uncompressed-float.h5" with sec2 driver. > / Group > Location: 1:96 > Links: 1 > /nbit_float Dataset {1000/1000, 1000/1000} > Location: 1:800 > Links: 1 > Chunks: {1000, 1000} 4000000 bytes > Storage: 4000000 logical bytes, 4000000 allocated bytes, 100.00% > utilization > Type: IEEE 32-bit big-endian float > seagrape:/users/orf/test% h5ls -lrv gzip-compressed-float.h5 > Opened "gzip-compressed-float.h5" with sec2 driver. > / Group > Location: 1:96 > Links: 1 > /nbit_float Dataset {1000/1000, 1000/1000} > Location: 1:800 > Links: 1 > Chunks: {1000, 1000} 4000000 bytes > Storage: 4000000 logical bytes, 3394707 allocated bytes, 117.83% > utilization > Filter-0: deflate-1 OPT {6} > Type: IEEE 32-bit big-endian float > seagrape:/users/orf/test% h5ls -lrv nbit.h5 > Opened "nbit.h5" with sec2 driver. > / Group > Location: 1:96 > Links: 1 > /nbit_float Dataset {1000/1000, 1000/1000} > Location: 1:800 > Links: 1 > Chunks: {1000, 1000} 4000000 bytes > Storage: 4000000 logical bytes, 4000000 allocated bytes, 100.00% > utilization > Filter-0: nbit-5 OPT {8, 0, 1000000, 1, 4, 1, 16, 7} > Type: 32-bit big-endian floating-point > (16 bits of precision beginning at bit 7) > (7 zero bits at bit 0, 9 zero bits at bit 23) > (significant for 9 bits at bit 7, msb implied) > (exponent for 6 bits at bit 16, bias is 0x1f) > (sign bit at 22) > seagrape:/users/orf/test% h5ls -lrv nbit-gzip.h5 > Opened "nbit-gzip.h5" with sec2 driver. > / Group > Location: 1:96 > Links: 1 > /nbit_float Dataset {1000/1000, 1000/1000} > Location: 1:800 > Links: 1 > Chunks: {1000, 1000} 4000000 bytes > Storage: 4000000 logical bytes, 876092 allocated bytes, 456.57% > utilization > Filter-0: nbit-5 OPT {8, 0, 1000000, 1, 4, 1, 16, 7} > Filter-1: deflate-1 OPT {6} > Type: 32-bit big-endian floating-point > (16 bits of precision beginning at bit 7) > (7 zero bits at bit 0, 9 zero bits at bit 23) > (significant for 9 bits at bit 7, msb implied) > (exponent for 6 bits at bit 16, bias is 0x1f) > (sign bit at 22) > > Leigh > > > -- > Leigh Orf > Associate Professor of Atmospheric Science > Department of Geology and Meteorology > Central Michigan University > Currently on sabbatical at the National Center for Atmospheric Research in > Boulder, CO > NCAR office phone: (303) 497-8200 > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
