Hi, This is only a note to announce a nice compressor in which I've been working lately, and that I think it is now ready for public testing. I've included a small example on how to include support for Blosc as a generic filter in the HDF5 library (see hdf5/ directory in sources).
I must warn you that, unfortunately, HDF5 cannot get the most out of Blosc because of one additional memcpy() call after / before the compression / decompression process. However, as this copy takes place, in general, in the CPU cache (mostly in L2 in modern CPUs), this effect is not very important. The PyTables community has already tested it quite intensively both stand- alone and inside PyTables, and I happy to say that it seems to work nicely so far. Enjoy! =============================================================== Announcing Blosc 1.0rc1 A blocking, shuffling and lossless compression library =============================================================== :Author: Francesc Alted i Abad :Contact: [email protected] :URL: http://blosc.pytables.org What is new? ============ Everything :-) This is the first public release of a project that started more than a year ago and that, after very intensive testing (several hundreds of TB compressed and decompressed without a glitch), it is finally getting ready for public consumption. This is Release Candidate 1 for Blosc 1.0 release, so please test it and report back any problem you may have with it. What is it? =========== Blosc [1]_ is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc is the first compressor (that I'm aware of) that is meant not only to reduce the size of large datasets on-disk or in-memory, but also to accelerate memory-bound computations. It uses the blocking technique (as described in [2]_) to reduce activity on the memory bus as much as possible. In short, this technique works by dividing datasets in blocks that are small enough to fit in caches of modern processors and perform compression / decompression there. It also leverages, if available, SIMD instructions (SSE2) and multi-threading capabilities of CPUs, in order to accelerate the compression / decompression process to a maximum. You can see some recent benchmarks about Blosc performance in [3]_ Blosc is distributed using the MIT license, see file LICENSES directory for details. .. [1] http://blosc.pytables.org .. [2] http://www.pytables.org/docs/CISE-12-2-ScientificPro.pdf .. [3] http://blosc.pytables.org/trac/wiki/SyntheticBenchmarks Download sources ================ Please go to: http://blosc.pytables.org/sources/ and download the most stable release from there. -- Francesc Alted _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
