Leigh,

Thank you for your report. We will be looking into the problem shortly.

Elena
On Nov 9, 2010, at 4:23 PM, Leigh Orf wrote:

> I am experimenting with the n-bit filter on floating point data to reduce 
> file size. I have been tweaking precision settings in order to reduce 
> precision. Mostly, things are working; however, I am a bit confused.
> 
> In the documentation 
> (http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetNbit) I read the 
> following statement:
> 
> "By nature, the N-Bit filter should not be used together with other I/O 
> filters"
> 
> I also read the following in the documentation 
> (http://www.hdfgroup.org/HDF5/doc/UG/10_Datasets.html) in discussion integer 
> nbit compression:
> 
> "After n-bit compression, none of these discarded bits, known as padding bits 
> will be stored on disk."
> 
> While no such statement is found under the floating point discussion below 
> that, I would assume it also holds for floating point data.
> 
> There is also this statement:
> 
> "The n-bit decompression algorithm is very similar to n-bit compression. The 
> only difference is that at the byte level, compression packs out all padding 
> bits and stores only significant bits into a continous buffer (unsigned char) 
> while decompression unpacks significant bits and inserts padding bits (zeros) 
> at the proper positions to recover the data bytes as they existed before 
> compression."
> 
> So, when I look at all these statements combined, I am led to believe that 
> just applying n-bit compression should give me a good reduction in file size. 
> However, that does not happen.
> 
> Using the example C code found on 
> http://www.hdfgroup.org/HDF5/doc/UG/10_Datasets.html I modified it to fill a 
> 2D array of size 1000x1000 with a known function (something involving logs 
> and sins to get me variability roughly what I find in my real scientific data 
> sets).
> 
> Below are four hdf5 files:
> seagrape:/users/orf/test% ls -l *.h5
> 
> -rw-r--r-- 1 orf users 4004016 Nov  9 15:01 uncompressed-float.h5
> -rw-r--r-- 1 orf users 4004016 Nov  9 15:01 nbit.h5
> -rw-r--r-- 1 orf users 3398723 Nov  9 15:02 gzip-compressed-float.h5
> -rw-r--r-- 1 orf users  880108 Nov  9 15:02 nbit-gzip.h5
> 
> uncompressed-float.h5 is without any compression whatsoever. As expected, the 
> file is roughly 1000x1000x4 bytes in size.
> nbit.h5 has the n-bit filter applied. It is the same size!
> gzip-compressed-float.h5 is the floating data with gzip (level 6) applied, so 
> it's lossless.
> nbit-gzip.h5 has the nbit filter followed by the gzip filter. Lots o' 
> compression!!
> 
> So, it seems to me that the nbit filter applied to floating point data stores 
> the zeroed padding bits...? I thought I'd see the file reduction without 
> having to apply gzip compression?
> 
> I have done several tests and I am sure that the nbit.h5 file has a loss of 
> precision when I subtract the lossless data from it.
> 
> Some more info. Note, the dataset name is the same for all four files.
> 
> seagrape:/users/orf/test% h5ls -lrv uncompressed-float.h5
> Opened "uncompressed-float.h5" with sec2 driver.
> /                        Group
>     Location:  1:96
>     Links:     1
> /nbit_float              Dataset {1000/1000, 1000/1000}
>     Location:  1:800
>     Links:     1
>     Chunks:    {1000, 1000} 4000000 bytes
>     Storage:   4000000 logical bytes, 4000000 allocated bytes, 100.00% 
> utilization
>     Type:      IEEE 32-bit big-endian float
> seagrape:/users/orf/test% h5ls -lrv gzip-compressed-float.h5
> Opened "gzip-compressed-float.h5" with sec2 driver.
> /                        Group
>     Location:  1:96
>     Links:     1
> /nbit_float              Dataset {1000/1000, 1000/1000}
>     Location:  1:800
>     Links:     1
>     Chunks:    {1000, 1000} 4000000 bytes
>     Storage:   4000000 logical bytes, 3394707 allocated bytes, 117.83% 
> utilization
>     Filter-0:  deflate-1 OPT {6}
>     Type:      IEEE 32-bit big-endian float
> seagrape:/users/orf/test% h5ls -lrv nbit.h5
> Opened "nbit.h5" with sec2 driver.
> /                        Group
>     Location:  1:96
>     Links:     1
> /nbit_float              Dataset {1000/1000, 1000/1000}
>     Location:  1:800
>     Links:     1
>     Chunks:    {1000, 1000} 4000000 bytes
>     Storage:   4000000 logical bytes, 4000000 allocated bytes, 100.00% 
> utilization
>     Filter-0:  nbit-5 OPT {8, 0, 1000000, 1, 4, 1, 16, 7}
>     Type:      32-bit big-endian floating-point
>                (16 bits of precision beginning at bit 7)
>                (7 zero bits at bit 0, 9 zero bits at bit 23)
>                (significant for 9 bits at bit 7, msb implied)
>                (exponent for 6 bits at bit 16, bias is 0x1f)
>                (sign bit at 22)
> seagrape:/users/orf/test% h5ls -lrv nbit-gzip.h5
> Opened "nbit-gzip.h5" with sec2 driver.
> /                        Group
>     Location:  1:96
>     Links:     1
> /nbit_float              Dataset {1000/1000, 1000/1000}
>     Location:  1:800
>     Links:     1
>     Chunks:    {1000, 1000} 4000000 bytes
>     Storage:   4000000 logical bytes, 876092 allocated bytes, 456.57% 
> utilization
>     Filter-0:  nbit-5 OPT {8, 0, 1000000, 1, 4, 1, 16, 7}
>     Filter-1:  deflate-1 OPT {6}
>     Type:      32-bit big-endian floating-point
>                (16 bits of precision beginning at bit 7)
>                (7 zero bits at bit 0, 9 zero bits at bit 23)
>                (significant for 9 bits at bit 7, msb implied)
>                (exponent for 6 bits at bit 16, bias is 0x1f)
>                (sign bit at 22)
> 
> Leigh
> 
> 
> -- 
> Leigh Orf
> Associate Professor of Atmospheric Science
> Department of Geology and Meteorology
> Central Michigan University
> Currently on sabbatical at the National Center for Atmospheric Research in 
> Boulder, CO
> NCAR office phone: (303) 497-8200
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to