Forgot to copy the list. On Tue, Dec 8, 2009 at 11:52 AM, Faisal Moledina <faisal.moled...@gmail.com> wrote: > Hello Francesc, > > Thank you for your response. > > On Fri, Dec 4, 2009 at 11:42 AM, Francesc Alted <fal...@pytables.org> wrote: >> Hello Faisal, >> >> In a few words, the rules of thumb for using PyTables efficiently in these >> situations is: >> >> - Don't create too many leaves. It is better to stick with 'long' tables or >> arrays. >> >> - Do not create too 'wide' (that is, with too many columns) tables. If you >> need a lot of fields in a row (taking more than, say, 16 KB/row), it is way >> better to save some of the variables into EArrays (if their number of entries >> per table row is fixed) or VLArrays (if there are a variable number of >> entries >> per row). > > Makes sense. What I've done now is created a table with the info for > each particle, and two EArrays: one with shape (0,3) for the current x > y z coordinates (called currentpos), and one with shape (0,5) for the > timepoint, particle_id, and all historical x, y, z coordinates to make > up an entire trajectory for each given particle. The currentpos array > is used at each timepoint to figure out the next position for the > particle, if it has left the system, or if it has been captured by a > source/sink. > >> Yeah. The advices there continue to apply for your situation. Keep trying >> to >> understand them and if you end with a possible implementation and want to >> optimize it still further, you may want to share it with us so that we can >> comment on it. > > Optimization needed! I need to treat separate groups of particles in > the system differently, and so I always access the currentpos array > with fancy slicing for the rows to get a subset of particles. So, > something like: > > xyz0=currentpos[part_id,:] > > However, once my currentpos array reaches 1e5 rows, it takes about > 15-20 s just to perform this step once. Over the course of my > simulation, the percentage of time spent per line shifts drastically > toward accessing a fancy slice of the currentpos array. At this time, > I did not use the expectedrows option or compression. > > To fix this, I'm going to try incorporating the x y z points into the > main info table. That way, I can save x0,y0,z0 using something like: > > x=particle_info.col('x') > x0=x[part_id,:] > > which should give me a numpy array. Accessing numpy array slices seems > to be faster than accessing EArrays. This was tested using the > following script: > > import numpy > from tables import * > import os > > def slicetest(arr,sli): > return arr[sli,:] > > def uniqify(seq): > """http://www.peterbe.com/plog/uniqifiers-benchmark""" > seen = set() > return [x for x in seq if x not in seen and not seen.add(x)] > > n=1000000 > s=50000 > > a=numpy.random.uniform(size=(n,3)) > > b=uniqify(numpy.round(n*numpy.random.uniform(size=s)).astype('int').tolist()) > > h5f="numpy-slice-test.h5" > if os.path.exists(h5f): os.remove(h5f) > h5=openFile(h5f,mode="w",title="Brownian siumulation") > h5a=h5.createEArray(h5.root,'testarray',Float64Atom(),(0,3),"Test array") > h5a.append(a) > > Then, in iPython: > > In [39]: %timeit slicetest(a,b) > 100 loops, best of 3: 17.8 ms per loop > > and > > In [40]: %timeit slicetest(h5a,b) > > ...is still running after a few minutes. I'll report back my findings. > > Faisal >
Also, the test on the EArray just finished: In [40]: %timeit slicetest(h5a,b) 1 loops, best of 3: 438 s per loop Faisal ------------------------------------------------------------------------------ Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users