Re: [Pytables-users] Chunk selection for optimized data access

Anthony Scopatz Mon, 03 Jun 2013 08:51:13 -0700

Hi Andreas,

First off, nothing should be this bad, but....

What is the data type of the array?  Also are you selecting chunksize
manually or letting PyTables figure it out?

Here are some things that you can try:

1.  Query with fancy indexing, once.  That is, rather than using a list
comprehension just say, _a[zip(*idx)]

2. set _a.nrowsinbuf [1] to a much smaller value (1, 5, or 10) which is
more appropriate for pulling out individual indexes.

Lastly, it is my opinion that the iteration mechanics are slower than they
can / should be.  I have a bunch of ideas about how to make them faster AND
clean up the code base but I won't have a ton of time to work on them in
the near term.  However, if this is something that you are interested in,
that would be great!  I'd love to help out anyone who was willing to take
this on.

Be Well
Anthony

1.
http://pytables.github.io/usersguide/libref/hierarchy_classes.html#tables.Leaf.nrowsinbuf

On Mon, Jun 3, 2013 at 7:45 AM, Andreas Hilboll <li...@hilboll.de> wrote:

> On 03.06.2013 14:43, Andreas Hilboll wrote:
> > Hi,
> >
> > I'm storing large datasets (5760 x 2880 x ~150) in a compressed EArray
> > (the last dimension represents time, and once per month there'll be one
> > more 5760x2880 array to add to the end).
> >
> > Now, extracting timeseries at one index location is slow; e.g., for four
> > indices, it takes several seconds:
> >
> >    In [19]: idx = ((5000, 600, 800, 900), (1000, 2000, 500, 1))
> >
> >    In [20]: %time AA = np.vstack([_a[i,j] for i,j in zip(*idx)])
> >    CPU times: user 4.31 s, sys: 0.07 s, total: 4.38 s
> >    Wall time: 7.17 s
> >
> > I have the feeling that this performance could be improved, but I'm not
> > sure about how to properly use the `chunkshape` parameter in my case.
> >
> > Any help is greatly appreciated :)
> >
> > Cheers, Andreas.
>
> PS: If I could get significant performance gains by not using an EArray
> and therefore re-creating the whole database each month, then this would
> also be an option.
>
> -- Andreas.
>
>
>
> ------------------------------------------------------------------------------
> Get 100% visibility into Java/.NET code with AppDynamics Lite
> It's a free troubleshooting tool designed for production
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://p.sf.net/sfu/appdyn_d2d_ap2
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2

_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Chunk selection for optimized data access

Reply via email to