Re: [Numpy-discussion] Comparison changes

2014-01-25 Thread Sebastian Berg
On Sat, 2014-01-25 at 00:18 +, Nathaniel Smith wrote:
 On 25 Jan 2014 00:05, Sebastian Berg sebast...@sipsolutions.net
 wrote:
 
  Hi all,
 
  in https://github.com/numpy/numpy/pull/3514 I proposed some changes
 to
  the comparison operators. This includes:
 
  1. Comparison with None will broadcast in the future, so that `arr
 ==
  None` will actually compare all elements to None. (A FutureWarning
 for
  now)
 
  2. I added that == and != will give FutureWarning when an error was
  raised. In the future they should not silence these errors anymore.
 (For
  example shape mismatches)
 
 This can just be a DeprecationWarning, because the only change is to
 raise new more errors.
 
Right, already is the case.

  3. We used to use PyObject_RichCompareBool for equality which
 includes
  an identity check. I propose to not do that identity check since we
 have
  elementwise equality (returning an object array for objects would be
  nice in some ways, but I think that is only an option for a
 dedicated
  function). The reason is that for example
 
   a = np.array([np.array([1, 2, 3]), 1])
   b = np.array([np.array([1, 2, 3]), 1])
   a == b
 
  will happen to work if it happens to be that `a[0] is b[0]`. This
  currently has no deprecation, since the logic is in the inner loop
 and I
  am not sure if it is easy to add well there.
 
 Surely any environment where we can call PyObject_RichCompareBool is
 an environment where we can issue a warning...?
 
Right, I suppose an extra identity check and comparing it with the other
result is indeed no problem. So I think I will add that.

- Sebastian


 -n
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Catching out-of-memory error before it happens

2014-01-25 Thread Daπid
On 24 January 2014 23:09, Dinesh Vadhia dineshbvad...@hotmail.com wrote:

  Francesc: Thanks. I looked at numexpr a few years back but it didn't
 support array slicing/indexing.  Has that changed?


No, but you can do it yourself.

big_array = np.empty(2)
piece = big_array[30:-50]
ne.evaluate('sqrt(piece)')

Here, creating piece does not increase memory use, as slicing shares the
original data (well, actually, it adds a mere 80 bytes, the overhead of an
array).
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] using loadtxt to load a text file in to a numpy array

2014-01-25 Thread Charles R Harris
On Thu, Jan 23, 2014 at 11:49 AM, Chris Barker chris.bar...@noaa.govwrote:

 Thanks for poking into this all. I've lost track a bit, but I think:

 The 'S' type is clearly broken on py3 (at least). I think that gives us
 room to change it, and backward compatibly is less of an issue because it's
 broken already -- do we need to preserve bug-for-bug compatibility? Maybe,
 but I suspect in this case, not --  the code the works fine on py3 with
 the 'S' type is probably only lucky that it hasn't encountered the issues
 yet.

 And no matter how you slice it, code being ported to py3 needs to deal
 with text handling issues.

 But here is where we stand:

 The 'S' dtype:

  - was designed for one-byte-per-char text data.
  - was mapped to the py2 string type.
  - used the classic C null-terminated approach.
  - can be used for arbitrary bytes (as the py2 string type can), but not
 quite, as it truncates null bytes -- so it really a bad idea to use it that
 way.

 Under py3:
   The 'S' type maps to the py3 bytes type, because that's the closest to
 the py2 string type. But it also does some inconsistent things with
 encoding, and does treat a lot of other things as text. But the py3 bytes
 type does not have the same text handling as the py2 string type, so things
 like:

 s = 'a string'
 np.array((s,), dtype='S')[0] == s

 Gives you False, rather than True on py2. This is because a py3 string is
 translated to the 'S' type (presumable with the default encoding, another
 maybe not a good idea, but returns a bytes object, which does not compare
 true to a py3 string. YOu can work aroudn this with varios calls to
 encode() and decode, and/or using b'a string', but that is ugly, kludgy,
 and doesn't work well with the py3 text model.


 The py2 = py3 transition separated bytes and strings: strings are
 unicode, and bytes are not to be used for text (directly). While there is
 some text-related functionality still in bytes, the core devs are quite
 clear that that is for special cases only, and not for general text
 processing.

 I don't think numpy should fight this, but rather embrace the py3 text
 model. The most natural way to do that is to use the existing 'U' dtype for
 text. Really the best solution for most cases. (Like the above case)

 However, there is a use case for a more efficient way to deal with text.
 There are a couple ways to go about that that have been brought up here:

 1: have a more efficient unicode dtype: variable length,
 multiple encoding options, etc
 - This is a fine idea that would support better text handling in
 numpy, and _maybe_ better interaction with external libraries (HDF, etc...)

 2: Have a one-byte-per-char text dtype:
   - This would be much easier to implement  fit into the current numpy
 model, and satisfy a lot of common use cases for scientific data sets.


We could certainly do both, but I'd like to see (2) get done sooner than
 later


This is pretty much my sense of things at the moment. I think 1) is needed
in the long term but that 2) is a quick fix that solves most problems in
the short term.



 A related issue is whether numpy needs a dtype analogous to py3 bytes --
 I'm still not sure of the use-case there, so can't comment -- would it need
 to be fixed length (fitting into the numpy data model better) or variable
 length, or ??? Some folks are (apparently) using the current 'S' type in
 this way, but I think that's ripe for errors, due to the null bytes issue.
 Though maybe there is a null-bytes-are-special binary format that isn't
 text -- I have no idea.

 So what do we  do with 'S'? It really is pretty broken, so we have a
 couple choices:

  (1)  depricate it, so that it stays around for backward compatibility
 but encourage people to either use 'U' for text, or one of the new dtypes
 that are yet to be implemented (maybe 's' for a one-byte-per-char dtype),
 and use either uint8 or the new bytes dtype that is yet to be implemented.

  (2) fix it -- in this case, I think we need to be clear what it is:
  -- A one-byte-char-text type? If so, it should map to a py3 string,
 and have a defined encoding (ascii or latin-1, probably), or even better a
 settable encoding (but only for one-byte-per-char encodings -- I don't
 think utf-8 is a good idea here, as a utf-8 encoded string is of unknown
 length. (there is some room for debate here, as the 'S' type is fixed
 length and truncates anyway, maybe it's fine for it to truncate utf-8 -- as
 long as it doesn't partially truncate in teh middle of a charactor)


I think we should make it a one character encoded type compatible with str
in python 2, and maybe latin-1 in python 3. I'm thinking latin-1 because of
pep 393 where it is effectively a UCS-1, but ascii might be a bit more
flexible because it is a subset of utf-8 and might serve better in python 2.


-- a bytes type? in which  case, we should clean out all teh
 automatic conversion to-from text that iare in it now.


I'm not sure what to 

Re: [Numpy-discussion] Numpy arrays vs typed memoryviews

2014-01-25 Thread Sturla Molden
I think I have said this before, but its worth a repeat: 

Pickle (including cPickle) is a slow hog! That might not be the overhead
you see, you just haven't noticed it yet. 

I saw this some years ago when I worked on shared memory arrays for Numpy
(cf. my account on Github). Shared memory really did not help to speed up
the IPC, because the entire overhead was dominated by pickle. (Shared
memory is a fine way of saving RAM, though.)

multiprocessing.Queue will use pickle for serialization, and is therefore
not the right tool for numerical parallel computing with Cython or NumPy.

In order to use multiprocessing efficiently with NumPy, we need a new Queue
type that knows about NumPy arrays (and/or Cython memoryviews), and treat
them as special cases. Getting rid of pickle altogether is the important
part, not facilitating its use even further. It is easy to make a Queue
type for Cython or NumPy arrays using a duplex pipe and couple of mutexes.
Or you can use shared memory as ringbuffer and atomic compare-and-swap on
the first bytes as spinlocks. It is not difficult to get the overhead of
queuing arrays down to little more than a memcpy. 

I've been wanting to do this for a while, so maybe it is time to start a
new toy project :) 

Sturla


Neal Hughes hughes.n...@gmail.com wrote:
 I like Cython a lot. My only complaint is that I have to keep switching 
 between the numpy array support and typed memory views. Both have there 
 advantages but neither can do every thing I need.
 
 Memoryviews have the clean syntax and seem to work better in cdef classes
  and in inline functions.
 
 But Memoryviews can't be pickled and so can't be passed between
 processes.  Also there seems to be a high overhead on converting between
 memory views  and python numpy arrays. Where this overhead is a problem,
 or where i need  to use pythons multiprocessing module I tend to switch to 
 numpy arrays.
 
 If memory views could be converted into python fast, and pickled I would 
 have no need for the old numpy array support.
 
 Wondering if these problems will ever be addressed, or if I am missing 
 something completely.
 
 --
 
 ---
 You received this message because you are subscribed to the Google Groups
 cython-users group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to cython-users+unsubscr...@googlegroups.com.
 For more options, visit a
 href=https://groups.google.com/groups/opt_out.;https://groups.google.com/groups/opt_out./a
 
 --=_Part_1342_18667054.1390644115997
 Content-Type: text/html; charset=UTF-8
 Content-Transfer-Encoding: quoted-printable
 
 div dir=3DltrI like Cython a lot. My only complaint is that I have to k=
 eep switching between the numpy array support and typed memory views. Both =
 have there advantages but neither can do every thing I need.divbr/div=
 divMemoryviews have the clean syntax and seem to work better in cdef clas=
 ses and in inline functions./divdivbr/divdivBut Memoryviews can't=
  be pickled and so can't be passed between processes. Also there seems to b=
 e a high overhead on converting between memory views and python numpy array=
 s. Where this overhead is a problem, or where i need to use pythons multipr=
 ocessing module I tend to switch to numpy arrays./divdivbr/divdiv=
 If memory views could be converted into python fast, and pickled I would ha=
 ve no need for the old numpy array support./divdivbr/divdivWonder=
 ing if these problems will ever be addressed, or if I am missing something =
 completely.br/divdivbr/divdivbr/div/div
 
 p/p
 
 -- br /
 amp;nbsp;br /
 --- br /
 You received this message because you are subscribed to the Google Groups 
 amp;=
 quot;cython-usersamp;quot; group.br /
 To unsubscribe from this group and stop receiving emails from it, send an e=
 mail to cython-users+unsubscr...@googlegroups.com.br /
 For more options, visit a href=3Dhttps://groups.google.com/groups/opt_out=
 https://groups.google.com/groups/opt_out/a.br /
 
 --=_Part_1342_18667054.1390644115997--

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Comparison changes

2014-01-25 Thread Stéfan van der Walt
On Sat, 25 Jan 2014 01:05:15 +0100, Sebastian Berg wrote:
 1. Comparison with None will broadcast in the future, so that `arr ==
 None` will actually compare all elements to None. (A FutureWarning for
 now)

This is a very useful change in behavior--thanks!

Stéfan

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Text array dtype for numpy

2014-01-25 Thread Oscar Benjamin
On 24 January 2014 22:43, Chris Barker chris.bar...@noaa.gov wrote:
 Oscar,

 Cool stuff, thanks!

 I'm wondering though what the use-case really is.

The use-case is precisely the use-case for dtype='S' on Py2 except
that it also works on Py3.

 The P3 text  model
 (actually the py2 one, too), is quite clear that you want users to think of,
 and work with, text as text -- and not care how things are encoding in the
 underlying implementation. You only want the user to think about encodings
 on I/O -- transferring stuff between systems where you can't avoid it. And
 you might choose different encodings based on different needs.

Exactly. But what you're missing is that storing text in a numpy array
is putting the text into bytes and the encoding needs to be specified.
My proposal involves explicitly specifying the encoding. This is the
key point about the Python 3 text model: it is not that encoding isn't
automatic (e.g. when you print() or call file.write with a text file);
the point is that there must never be ambiguity about the encoding
that is used when encode/decode occurs.

 So why have a different, the-user-needs-to-think-about-encodings numpy
 dtype? We already have 'U' for full-on unicode support for text. There is a
 good argument for a more compact internal representation for text compatible
 with one-byte-per-char encoding, thus the suggestion for such a dtype. But I
 don't see the need for quite this. Maybe I'm not being a creative enough
 thinker.

Because users want to store text in a numpy array and use less than 4
bytes per character. You expressed a desire for this. The only
difference between this and your latin-1 suggestion is that this one
has an explicit encoding that is visible to the user and that you can
choose that encoding to be anything that your Python installation
supports.

 Also, we may want numpy to interact at a low level with other libs that
 might have binary encoded text (HDF, etc) -- in which case we need a bytes
 dtype that can store that data, and perhaps encoding and decoding ufuncs.

Perhaps there is a need for a bytes dtype as well. But not that you
can use textarray with encoding='ascii' to satisfy many of these use
cases. So h5py and pytables can expose an interface that stores text
as bytes but has a clearly labelled (and enforced) encoding.

 If we want a more efficient and compact unicode implementation  then the py3
 one is a good  place to start -it's pretty slick! Though maybe harder to due
 in numpy as text in numpy probably wouldn't be immutable.

It's not a good fit for numpy because numpy arrays expose their memory
buffer. More on this below but if there was to be something as drastic
as the FSR then it would be better to think about how to make an
ndarray type that is completely different, has an opaque memory buffer
and can handle arbitrary length text strings.

 To make a slightly more concrete proposal, I've implemented a pure
 Python ndarray subclass that I believe can consistently handle
 text/bytes in Python 3.

 this scares me right there -- is it text or bytes??? We really don't want
 something that is both.

I believe that there is a conceptual misunderstanding about what a
numpy array is here.

A numpy array is a clever view onto a memory buffer. A numpy array
always has two interfaces, one that describes a memory buffer and one
that delivers Python objects representing the abstract quantities
described by each portion of the memory buffer. The dtype specifies
three things:
1) How many bytes of the buffer are used.
2) What kind of abstract object this part of the buffer represents.
3) The mapping from the bytes in this segment of the buffer to the
abstract object.

As an example:

 import numpy as np
 a = np.array([1, 2, 3], dtype='u4')
 a
array([1, 2, 3], dtype=uint32)
 a.tostring()
b'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00'

So what is this array? Is it bytes or is it integers? It is both. The
array is a view onto a memory buffer and the dtype is the encoding
that describes the meaning of the bytes in different segments. In this
case the dtype is 'u4'. This tells us that we need 4 bytes per
segment, that each segment represents an integer and that the mapping
from byte segments to integers is the unsigned little-endian mapping.

How can we do the same thing with text? We need a way to map text to
fixed-width bytes. Mapping text to bytes is done with text encodings.
So we need a dtype that incorporates a text encoding in order to
define the relationship between the bytes in the array's memory buffer
and the abstract entity that is a sequence of Unicode characters.
Using dtype='U' doesn't get around this:

 a = np.array(['qwe'], dtype='U')
 a
array(['qwe'],
  dtype='U3')
 a[0] # text
'qwe'
 a.tostring() # bytes
b'q\x00\x00\x00w\x00\x00\x00e\x00\x00\x00'

In my proposal you'd get the same by using 'utf-32-le' as the encoding
for your text array.

 The idea is that the array has an encoding. It stores strings as
 bytes. The 

[Numpy-discussion] ANN: numexpr 2.3 (final) released

2014-01-25 Thread Francesc Alted
==
  Announcing Numexpr 2.3
==

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like 3*a+4*b) are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...)  while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.
Numexpr is already being used in a series of packages (PyTables, pandas,
BLZ...) for helping doing computations faster.


What's new
==

The repository has been migrated to https://github.com/pydata/numexpr.
All new tickets and PR should be directed there.

Also, a `conj()` function for computing the conjugate of complex arrays 
has been added.
Thanks to David Menéndez.  See PR #125.

Finallly, we fixed a DeprecationWarning derived of using ``oa_ndim ==
0`` and ``op_axes == NULL`` with `NpyIter_AdvancedNew()` and
NumPy 1.8.  Thanks to Mark Wiebe for advise on how to fix this
properly.

Many thanks to Christoph Gohlke and Ilan Schnell for his help during
the testing of this release in all kinds of possible combinations of
platforms and MKL.

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/wiki/Release-Notes

or have a look at RELEASE_NOTES.txt in the tarball.

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: python-blosc 1.2.0 released

2014-01-25 Thread Francesc Alted
=
Announcing python-blosc 1.2.0
=

What is new?


This release adds support for the multiple compressors added in Blosc
1.3 series.  The new compressors are:

* lz4 (http://code.google.com/p/lz4/): A very fast
   compressor/decompressor.  Could be thought as a replacement of the
   original BloscLZ, but it can behave better is some scenarios.

* lz4hc (http://code.google.com/p/lz4/): This is a variation of LZ4
   that achieves much better compression ratio at the cost of being
   much slower for compressing.  Decompression speed is unaffected (and
   sometimes better than when using LZ4 itself!), so this is very good
   for read-only datasets.

* snappy (http://code.google.com/p/snappy/): A very fast
   compressor/decompressor.  Could be thought as a replacement of the
   original BloscLZ, but it can behave better is some scenarios.

* zlib (http://www.zlib.net/): This is a classic.  It achieves very
   good compression ratios, at the cost of speed.  However,
   decompression speed is still pretty good, so it is a good candidate
   for read-only datasets.

Selecting the compressor is just a matter of specifying the new `cname`
parameter in compression functions.  For example::

   in = numpy.arange(N, dtype=numpy.int64)
   out = blosc.pack_array(in, cname=lz4)

Just to have an overview of the differences between the different
compressors in new Blosc, here it is the output of the included
compress_ptr.py benchmark:

https://github.com/ContinuumIO/python-blosc/blob/master/bench/compress_ptr.py

that compresses/decompresses NumPy arrays with different data
distributions::

   Creating different NumPy arrays with 10**7 int64/float64 elements:
 *** np.copy()  Time for memcpy(): 0.030 s

   *** the arange linear distribution ***
 *** blosclz  *** Time for comp/decomp: 0.013/0.022 s. Compr ratio: 
136.83
 *** lz4  *** Time for comp/decomp: 0.009/0.031 s. Compr ratio: 
137.19
 *** lz4hc*** Time for comp/decomp: 0.103/0.021 s. Compr ratio: 
165.12
 *** snappy   *** Time for comp/decomp: 0.012/0.045 s. Compr ratio:  
20.38
 *** zlib *** Time for comp/decomp: 0.243/0.056 s. Compr ratio: 
407.60

   *** the linspace linear distribution ***
 *** blosclz  *** Time for comp/decomp: 0.031/0.036 s. Compr ratio:  
14.27
 *** lz4  *** Time for comp/decomp: 0.016/0.033 s. Compr ratio:  
19.68
 *** lz4hc*** Time for comp/decomp: 0.188/0.020 s. Compr ratio:  
78.21
 *** snappy   *** Time for comp/decomp: 0.020/0.032 s. Compr ratio:  
11.72
 *** zlib *** Time for comp/decomp: 0.290/0.048 s. Compr ratio:  
90.90

   *** the random distribution ***
 *** blosclz  *** Time for comp/decomp: 0.083/0.025 s. Compr 
ratio:   4.35
 *** lz4  *** Time for comp/decomp: 0.022/0.034 s. Compr 
ratio:   4.65
 *** lz4hc*** Time for comp/decomp: 1.803/0.039 s. Compr 
ratio:   5.61
 *** snappy   *** Time for comp/decomp: 0.028/0.023 s. Compr 
ratio:   4.48
 *** zlib *** Time for comp/decomp: 3.146/0.073 s. Compr 
ratio:   6.17

That means that Blosc in combination with LZ4 can compress at speeds
that can be up to 3x faster than a pure memcpy operation.
Decompression is a bit slower (but still in the same order than
memcpy()) probably because writing to memory is slower than reading.
This was using an Intel Core i5-3380M CPU @ 2.90GHz, runnng Python 3.3
and Linux 3.7.10, but YMMV (and will vary!).

For more info, you can have a look at the release notes in:

https://github.com/ContinuumIO/python-blosc/wiki/Release-notes

More docs and examples are available in the documentation site:

http://blosc.pydata.org


What is it?
===

python-blosc (http://blosc.pydata.org/) is a Python wrapper for the
Blosc compression library.

Blosc (http://blosc.org) is a high performance compressor optimized for
binary data.  It has been designed to transmit data to the processor
cache faster than the traditional, non-compressed, direct memory fetch
approach via a memcpy() OS call.  Whether this is achieved or not
depends of the data compressibility, the number of cores in the system,
and other factors.  See a series of benchmarks conducted for many
different systems: http://blosc.org/trac/wiki/SyntheticBenchmarks.

Blosc works well for compressing numerical arrays that contains data
with relatively low entropy, like sparse data, time series, grids with
regular-spaced values, etc.

There is also a handy command line for Blosc called Bloscpack
(https://github.com/esc/bloscpack) that allows you to compress large
binary datafiles on-disk.  Although the format for Bloscpack has not
stabilized yet, it allows you to effectively use Blosc from your
favorite shell.


Installing
==

python-blosc is in PyPI repository, so installing it is easy:

$ pip install -U blosc  # yes, you should omit the python- prefix


Download sources


The sources are managed through github 

[Numpy-discussion] ANN: BLZ 0.6.1 has been released

2014-01-25 Thread Francesc Alted
Announcing BLZ 0.6 series
=

What it is
--

BLZ is a chunked container for numerical data.  Chunking allows for
efficient enlarging/shrinking of data container.  In addition, it can
also be compressed for reducing memory/disk needs.  The compression
process is carried out internally by Blosc, a high-performance
compressor that is optimized for binary data.

The main objects in BLZ are `barray` and `btable`.  `barray` is meant
for storing multidimensional homogeneous datasets efficiently.
`barray` objects provide the foundations for building `btable`
objects, where each column is made of a single `barray`.  Facilities
are provided for iterating, filtering and querying `btables` in an
efficient way.  You can find more info about `barray` and `btable` in
the tutorial:

http://blz.pydata.org/blz-manual/tutorial.html

BLZ can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too)
either from memory or from disk.  In the future, it is planned to use
Numba as the computational kernel and to provide better Blaze
(http://blaze.pydata.org) integration.


What's new
--

BLZ has been branched off from the Blaze project
(http://blaze.pydata.org).  BLZ was meant as a persistent format and
library for I/O in Blaze.  BLZ in Blaze is based on previous carray
0.5 and this is why this new version is labeled 0.6.

BLZ supports completely transparent storage on-disk in addition to
memory.  That means that *everything* that can be done with the
in-memory container can be done using the disk as well.

The advantages of a disk-based container is that the addressable space
is much larger than just your available memory.  Also, as BLZ is based
on a chunked and compressed data layout based on the super-fast Blosc
compression library, the data access speed is very good.

The format chosen for the persistence layer is based on the
'bloscpack' library and described in the Persistent format for BLZ
chapter of the user manual ('docs/source/persistence-format.rst').
More about Bloscpack here: https://github.com/esc/bloscpack

You may want to know more about BLZ in this blog entry:
http://continuum.io/blog/blz-format

In this version, support for Blosc 1.3 has been added, that meaning
that a new `cname` parameter has been added to the `bparams` class, so
that you can select you preferred compressor from 'blosclz', 'lz4',
'lz4hc', 'snappy' and 'zlib'.

Also, many bugs have been fixed, providing a much smoother experience.

CAVEAT: The BLZ/bloscpack format is still evolving, so don't trust on
forward compatibility of the format, at least until 1.0, where the
internal format will be declared frozen.


Resources
-

Visit the main BLZ site repository at:
http://github.com/ContinuumIO/blz

Read the online docs at:
http://blz.pydata.org/blz-manual/index.html

Home of Blosc compressor:
http://www.blosc.org

User's mail list:
blaze-...@continuum.io



Enjoy!

Francesc Alted
Continuum Analytics, Inc.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion