Hello all, here at the DKRZ (German Climate Computing Center) we have to store large volumes of climate data, some of which is stored in HDF5-files. So, during the last few month we have been doing some research into climate data compression.
With very interesting results: We have been able to shrink our test data set to 38.76% of the original file size. This is a compression factor of more than 2.5 and it is significantly better than the performance of all the standard methods we tested (bzip2 = 54%, gzip = 58%, sldc = 66% and lzma = 46%). Also, we have seen that the lzma-algorithm performs much better than the other standard algorithms. Even though we have constructed our methods to fit climate data, the features we exploited for compression are very general and likely to apply to other scientific data as well. This is why we are confident that many of you could profit from these methods as well, and we would be happy to share our results with the rest of the community. The filtering mechanism in HDF5 predestines it to be the first place for us to share our algorithms. But first we would be very interested to see the lzma algorithm integrated as an optional filtering method, something that should be very easy to do and offers large benefits to all users. Since we would be willing to do the necessary work, we just need to know the (in)formal requirements to integrate a new filter. And, of course, we would be very interested to hear about other recent work which adresses compression in HDF5, and to get in touch with whoever works on it. Best regards, Nathanael Hübbe http://wr.informatik.uni-hamburg.de/ http://wr.informatik.uni-hamburg.de/people/nathanael_huebbe _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
