Re: [Numpy-discussion] ANN: BLZ 0.6.1 has been released

2014-01-26 Thread Dinesh Vadhia
For me, binary data wrt arrays means that data values are [0|1].  Is this 
what is meant in The compression process is carried out internally by 
Blosc, a high-performance compressor that is optimized for binary data. ?
 

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: BLZ 0.6.1 has been released

2014-01-26 Thread Valentin Haenel
Hi Dinesh Vadhia,

* Dinesh Vadhia dineshbvad...@hotmail.com [2014-01-26]:
 For me, binary data wrt arrays means that data values are [0|1].  Is this 
 what is meant in The compression process is carried out internally by 
 Blosc, a high-performance compressor that is optimized for binary data. ?

I believe, the term 'binary data' in this context refers to numerical
data -- e.g. floats and ints -- in the sense that it is not ascii or
other text.

Blosc is especially well suited for this kind of data due to its
optional shuffle filter. This filter will re-organize the bytes in the
data that is to be compressed in order of significance. For this filter
to work, each data value must be composed of multiple bytes, e.g.
int64.  For data values that are composed of a single byte, e.g. int8 or
char, the filter does not work so well.

Hope that helps,

V-
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: BLZ 0.6.1 has been released

2014-01-25 Thread Francesc Alted
Announcing BLZ 0.6 series
=

What it is
--

BLZ is a chunked container for numerical data.  Chunking allows for
efficient enlarging/shrinking of data container.  In addition, it can
also be compressed for reducing memory/disk needs.  The compression
process is carried out internally by Blosc, a high-performance
compressor that is optimized for binary data.

The main objects in BLZ are `barray` and `btable`.  `barray` is meant
for storing multidimensional homogeneous datasets efficiently.
`barray` objects provide the foundations for building `btable`
objects, where each column is made of a single `barray`.  Facilities
are provided for iterating, filtering and querying `btables` in an
efficient way.  You can find more info about `barray` and `btable` in
the tutorial:

http://blz.pydata.org/blz-manual/tutorial.html

BLZ can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too)
either from memory or from disk.  In the future, it is planned to use
Numba as the computational kernel and to provide better Blaze
(http://blaze.pydata.org) integration.


What's new
--

BLZ has been branched off from the Blaze project
(http://blaze.pydata.org).  BLZ was meant as a persistent format and
library for I/O in Blaze.  BLZ in Blaze is based on previous carray
0.5 and this is why this new version is labeled 0.6.

BLZ supports completely transparent storage on-disk in addition to
memory.  That means that *everything* that can be done with the
in-memory container can be done using the disk as well.

The advantages of a disk-based container is that the addressable space
is much larger than just your available memory.  Also, as BLZ is based
on a chunked and compressed data layout based on the super-fast Blosc
compression library, the data access speed is very good.

The format chosen for the persistence layer is based on the
'bloscpack' library and described in the Persistent format for BLZ
chapter of the user manual ('docs/source/persistence-format.rst').
More about Bloscpack here: https://github.com/esc/bloscpack

You may want to know more about BLZ in this blog entry:
http://continuum.io/blog/blz-format

In this version, support for Blosc 1.3 has been added, that meaning
that a new `cname` parameter has been added to the `bparams` class, so
that you can select you preferred compressor from 'blosclz', 'lz4',
'lz4hc', 'snappy' and 'zlib'.

Also, many bugs have been fixed, providing a much smoother experience.

CAVEAT: The BLZ/bloscpack format is still evolving, so don't trust on
forward compatibility of the format, at least until 1.0, where the
internal format will be declared frozen.


Resources
-

Visit the main BLZ site repository at:
http://github.com/ContinuumIO/blz

Read the online docs at:
http://blz.pydata.org/blz-manual/index.html

Home of Blosc compressor:
http://www.blosc.org

User's mail list:
blaze-...@continuum.io



Enjoy!

Francesc Alted
Continuum Analytics, Inc.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion