[Hdf-forum] ANN: Blosc 1.0rc1: a blocking, shuffling and loss-less compression library

Francesc Alted Mon, 28 Jun 2010 08:44:49 -0700

Hi,

This is only a note to announce a nice compressor in which I've been working 
lately, and that I think it is now ready for public testing.  I've included a 
small example on how to include support for Blosc as a generic filter in the 
HDF5 library (see hdf5/ directory in sources).


I must warn you that, unfortunately, HDF5 cannot get the most out of Blosc 
because of one additional memcpy() call after / before the compression / 
decompression process.  However, as this copy takes place, in general, in the 
CPU cache (mostly in L2 in modern CPUs), this effect is not very important.

The PyTables community has already tested it quite intensively both stand-
alone and inside PyTables, and I happy to say that it seems to work nicely so 
far.

Enjoy!

===============================================================
 Announcing Blosc 1.0rc1
 A blocking, shuffling and lossless compression library
===============================================================

:Author: Francesc Alted i Abad
:Contact: [email protected]
:URL: http://blosc.pytables.org


What is new?
============

Everything :-) This is the first public release of a project that
started more than a year ago and that, after very intensive testing
(several hundreds of TB compressed and decompressed without a glitch),
it is finally getting ready for public consumption.

This is Release Candidate 1 for Blosc 1.0 release, so please test it
and report back any problem you may have with it.


What is it?
===========

Blosc [1]_ is a high performance compressor optimized for binary data.
It has been designed to transmit data to the processor cache faster
than the traditional, non-compressed, direct memory fetch approach via
a memcpy() OS call.  Blosc is the first compressor (that I'm aware of)
that is meant not only to reduce the size of large datasets on-disk or
in-memory, but also to accelerate memory-bound computations.

It uses the blocking technique (as described in [2]_) to reduce
activity on the memory bus as much as possible.  In short, this
technique works by dividing datasets in blocks that are small enough
to fit in caches of modern processors and perform compression /
decompression there.  It also leverages, if available, SIMD
instructions (SSE2) and multi-threading capabilities of CPUs, in order
to accelerate the compression / decompression process to a maximum.

You can see some recent benchmarks about Blosc performance in [3]_

Blosc is distributed using the MIT license, see file LICENSES
directory for details.

.. [1] http://blosc.pytables.org
.. [2] http://www.pytables.org/docs/CISE-12-2-ScientificPro.pdf
.. [3] http://blosc.pytables.org/trac/wiki/SyntheticBenchmarks


Download sources
================

Please go to:

http://blosc.pytables.org/sources/

and download the most stable release from there.


-- 
Francesc Alted

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

[Hdf-forum] ANN: Blosc 1.0rc1: a blocking, shuffling and loss-less compression library

Reply via email to