Greetings Jay,

Jay Banyer wrote:
Gday,

I'm using 24 bit integers with HDF5 but finding the performance very poor.

I'm new to HDF5. I'm evaluating the format for use with a radio telescope. The telescope will produce about 7TB per 12 hours of raw data, so space and write efficiency are important.

The telescope system produces 24 bit signed integers. If we convert to 32 bits our files will grow by 33%, ie over 9.5TB instead of 7TB.
I presume you have no requirement to use 24 bit integers other than to save space and time, i.e. no odd ball processor that use 24 bits for arithmetic, etc.  Here are some suggestions:

Are you using the internal HDF5 compression?  If not, I suggest compressing your 9.5TB 32 bit integer files to see how much space you save.  Using internal compression will ultimately save you time as well as it reduces the amount of data being pushed across the slow disk channel.  You can test this very easily using the h5repack utility which reduces your programming effort to selecting command line options.  You will likely need to play with chunking parameters.  Read the documentation on chunking, ask on the forum if you are still confused.  If you can find a satisfactory set of settings you'll need to modify your (presumably) C code to turn on the compression filter prior to the disk writes.  You might also find adding the shuffle filter helps since the upper byte of every 32 bit word will have one of two values.

HDF5 has a scale and offset filter.  I personally dislike this approach for floats because it is lossy, but since you have integer data it might be right in this situation.  h5repack can, apparently, let you apply this filter as well. 

In both cases the end user of the data file will not need to know which filters were used.  The HDF5 library will automatically apply the correct reverse filter when the data are read back in.

More generally, a hint for evaluating HDF5 is to start by using a high level language interface.  Python has an excellent open source interface, h5py; and the commercial packages Matlab and IDL include an interface as well.  These interfaces deal with a lot of the bookkeeping for you so that you can concentrate on evaluating your data format without getting bogged down in the details of the C API.

Cheers,
--dan


-- 
Daniel Kahn
Science Systems and Applications Inc.
301-867-2162



_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to