Hello Giovanni,

Great to hear that everything is working much better for you now and that
everything is much faster and smaller than NPY ;)

Do you know how the default value is set btw?


This is computed via a magical heuristic algorithm written by Francesc (?)
called computechunksize().

This is really optimized for dense data (Tables) so it is
not surprising that in performs poorly in your case.  Any updates you want
to make to PyTables to also handle sparse data well out of the box would be
very welcome ;)

1. https://github.com/PyTables/PyTables/blob/develop/tables/idxutils.py#L54



On Mon, Jun 24, 2013 at 10:51 AM, Giovanni Luca Ciampaglia <
glciamp...@gmail.com> wrote:

> Hi Anthony,
>
> thanks for the explanation and the links, it's much clearer now. So without
> compression a CArray is really a smarter type of sparse file, but you have
> to
> set a sensible chunk shape. Do you know how the default value is set btw?
> I am
> asking because I didn't see any change in performance from using the
> default
> value and using (1, N), where (N,N) is the shape of the matrix. I guess
> that the
> write performance depends crucially on the size of the I/O buffer, so the
> default must be choosing a similar setting.
>
> Anyway I have played a bit with other values of the chunk shape in
> conjunction
> with the compression level and using a shape (1,100) and a complevel=5
> gives
> speeds that are only 10-15% slower than what I get at shape=(1,1) and
> complevel=0. The resulting file is 10 times smaller, and something like 35
> times
> smaller than a NPY sparse file, btw!
>
> Thanks!
>
> Giovanni
>
> On 06/24/2013 05:25 AM, pytables-users-request@lists.sourceforge.netwrote:
> > Hi Giovanni!
> >
> > I think that you may have some misunderstanding about how chucking works,
> > which is leading you to get terrible performance.  In fact what you
> > describe is a great strategy (right all and zip) for using normal Arrays.
> >
> > However, chunking and CArrays don't work like this.  If a chunk contains
> no
> > data, it is not written at all!  Also, all zipping takes place on the
> chunk
> > level.  Thus for very small chunks you can actually increase the file
> size
> > and access time by using compression.
> >
> > For sparse matrices and CArrays, you need to play around with the
> > chunkshape argument to create_carray()  and compression.  Performance is
> > going to be affected how dense the matrix is and how grouped it is.  For
> > example, for a very dense and randomly distributed matrix, chunkshape=1
> and
> > no compression is best.  For block diagonal matrices, the chunkshape
> should
> > be the nominal block shape.  Compression is only useful here if the
> blocks
> > all have similar values or the block shape is large.  For example
> >
> > 1 1 0 0 0 0
> > 1 1 0 0 0 0
> > 0 0 1 1 0 0
> > 0 0 1 1 0 0
> > 0 0 0 0 1 1
> > 0 0 0 0 1 1
> >
> > is well suited to a chunkshape=(2, 2)
> >
> > For more information on the HDF model please see my talk slides and video
> >   [1,2]  I hope this helps.
> >
> > Be Well
> > Anthony
> >
> > PS. Glad to see you using the new API
> >
> > 1.https://github.com/scopatz/hdf5-is-for-lovers
> > 2.http://www.youtube.com/watch?v=Nzx0HAd3FiI
>
>
> --
> Giovanni Luca Ciampaglia
>
> Postdoctoral fellow
> Center for Complex Networks and Systems Research
> Indiana University
>
> ✎ 910 E 10th St ∙ Bloomington ∙ IN 47408
> ☞ http://cnets.indiana.edu/
> ✉ gciam...@indiana.edu
>
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to