Hi Nathanael,

On Oct 11, 2011, at 10:18 AM, Nathanael Huebbe wrote:

> Hello all,
> 
> here at the DKRZ (German Climate Computing Center) we have to store
> large volumes of climate data, some of which is stored in HDF5-files.
> So, during the last few month we have been doing some research into
> climate data compression.
> 
> With very interesting results: We have been able to shrink our test data
> set to 38.76% of the original file size. This is a compression factor of
> more than 2.5 and it is significantly better than the performance of all
> the standard methods we tested (bzip2 = 54%, gzip = 58%, sldc = 66% and
> lzma = 46%). Also, we have seen that the lzma-algorithm performs much
> better than the other standard algorithms.
> 
> Even though we have constructed our methods to fit climate data, the
> features we exploited for compression are very general and likely to
> apply to other scientific data as well. This is why we are confident
> that many of you could profit from these methods as well, and we would
> be happy to share our results with the rest of the community.
> 
> The filtering mechanism in HDF5 predestines it to be the first place for
> us to share our algorithms. But first we would be very interested to see
> the lzma algorithm integrated as an optional filtering method, something
> that should be very easy to do and offers large benefits to all users.
> 
> Since we would be willing to do the necessary work, we just need to know
> the (in)formal requirements to integrate a new filter. And, of course,
> we would be very interested to hear about other recent work which
> adresses compression in HDF5, and to get in touch with whoever works on
> it.
> 
Any new future adds to software maintenance and we (The HDF Group) have to be 
very careful about it.

We have been working on a process of accepting patches and new features from 
the community, but there are a few issues (like criterion of acceptence, IP) 
that have to be resolved. I.e., at this point we do not have criterion and 
formal requirements for integrating new features.

But we do have a procedure for registering new filters with us. It is simple:

To
 request
 a 
filter
 identifier
 please
 contact
 [email protected]
 with 

the 
following 
information 

# Contact
 information for
 developer
 requesting the 
new 
identifier
# Short
 description
 of 
the 
new 
filter
# Links
 to
 any 
relevant 
information 
including 
licensing
 information

Information about the 3rd party filters that are registered with us will be on 
the Web (TBD; we are working on it; it was on our PBWiki and we don't use it 
anymore).

Currently it is:

305     LZO (used by PyTables)
307     BZIP2 (used by PyTables)
32000   LZF (used by H5Py)
32001   BLOSC (used by PyTables)

Please let me know if you have any questions.

Elena
 
> Best regards,
> Nathanael Hübbe
> 
> http://wr.informatik.uni-hamburg.de/
> http://wr.informatik.uni-hamburg.de/people/nathanael_huebbe
> 
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to