Forgot to copy the list.

On Tue, Dec 8, 2009 at 11:52 AM, Faisal Moledina
<faisal.moled...@gmail.com> wrote:
> Hello Francesc,
>
> Thank you for your response.
>
> On Fri, Dec 4, 2009 at 11:42 AM, Francesc Alted <fal...@pytables.org> wrote:
>> Hello Faisal,
>>
>> In a few words, the rules of thumb for using PyTables efficiently in these
>> situations is:
>>
>> - Don't create too many leaves.  It is better to stick with 'long' tables or
>> arrays.
>>
>> - Do not create too 'wide' (that is, with too many columns) tables.  If you
>> need a lot of fields in a row (taking more than, say, 16 KB/row), it is way
>> better to save some of the variables into EArrays (if their number of entries
>> per table row is fixed) or VLArrays (if there are a variable number of 
>> entries
>> per row).
>
> Makes sense. What I've done now is created a table with the info for
> each particle, and two EArrays: one with shape (0,3) for the current x
> y z coordinates (called currentpos), and one with shape (0,5) for the
> timepoint, particle_id, and all historical x, y, z coordinates to make
> up an entire trajectory for each given particle. The currentpos array
> is used at each timepoint to figure out the next position for the
> particle, if it has left the system, or if it has been captured by a
> source/sink.
>
>> Yeah.  The advices there continue to apply for your situation.  Keep trying 
>> to
>> understand them and if you end with a possible implementation and want to
>> optimize it still further, you may want to share it with us so that we can
>> comment on it.
>
> Optimization needed! I need to treat separate groups of particles in
> the system differently, and so I always access the currentpos array
> with fancy slicing for the rows to get a subset of particles. So,
> something like:
>
>    xyz0=currentpos[part_id,:]
>
> However, once my currentpos array reaches 1e5 rows, it takes about
> 15-20 s just to perform this step once. Over the course of my
> simulation, the percentage of time spent per line shifts drastically
> toward accessing a fancy slice of the currentpos array. At this time,
> I did not use the expectedrows option or compression.
>
> To fix this, I'm going to try incorporating the x y z points into the
> main info table. That way, I can save x0,y0,z0 using something like:
>
>    x=particle_info.col('x')
>    x0=x[part_id,:]
>
> which should give me a numpy array. Accessing numpy array slices seems
> to be faster than accessing EArrays. This was tested using the
> following script:
>
>    import numpy
>    from tables import *
>    import os
>
>    def slicetest(arr,sli):
>        return arr[sli,:]
>
>    def uniqify(seq):
>        """http://www.peterbe.com/plog/uniqifiers-benchmark""";
>        seen = set()
>        return [x for x in seq if x not in seen and not seen.add(x)]
>
>    n=1000000
>    s=50000
>
>    a=numpy.random.uniform(size=(n,3))
>    
> b=uniqify(numpy.round(n*numpy.random.uniform(size=s)).astype('int').tolist())
>
>    h5f="numpy-slice-test.h5"
>    if os.path.exists(h5f): os.remove(h5f)
>    h5=openFile(h5f,mode="w",title="Brownian siumulation")
>    h5a=h5.createEArray(h5.root,'testarray',Float64Atom(),(0,3),"Test array")
>    h5a.append(a)
>
> Then, in iPython:
>
>    In [39]: %timeit slicetest(a,b)
>    100 loops, best of 3: 17.8 ms per loop
>
> and
>
>    In [40]: %timeit slicetest(h5a,b)
>
> ...is still running after a few minutes. I'll report back my findings.
>
> Faisal
>

Also, the test on the EArray just finished:

    In [40]: %timeit slicetest(h5a,b)
    1 loops, best of 3: 438 s per loop

Faisal

------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to