Re: [Hdf-forum] dumping particle trajectories

Yngve Inntjore Levinsen Fri, 25 Mar 2011 07:34:47 -0700

Hi,

Thanks to both of you for very interesting suggestions which I am quite sure I 
wouldn't have found on my own! I'll look into both of them as soon as I have 
time.


There is another issue (this is "stupidity" of the code, not HDF5) that it 
actually only tracks 64 particles at the time. I think it was some kind of 
memory limitations back in the days (the code is still F77 for the most part). 
That means that I don't actually have all particles at the given timestep at 
the same time in the simulation, so I get.. well many dimensions quite quickly.

I'll keep you posted about my outcome! :)

Cheers,
Yngve

On Thursday 24 March 2011 08:51:43 PM Mark Howison wrote:
> .... and I'll throw in one more suggestion, the H5Part library:
> 
> http://vis.lbl.gov/Research/H5Part/
> 
> which allows you to quickly and easily dump out particle data into an HDF5 
> file.
> 
> The data model is the same one Werner suggested: each timestep has its
> own group, and the particles are stored as 1D arrays within those
> groups. You can have different numbers of particles in each timestep.
> 
> For each iteration, you would do something like:
> 
> file = H5PartOpenFile("particles.h5", H5_O_WRONLY, MPI_COMM_WORLD);
> 
> (for loop) {
>  H5PartSetStep(file, i);
>  H5PartSetNumParticles(file, nparticles);
>  H5PartWriteDataFloat64(file, "x", x);
>  H5PartWriteDataFloat64(file, "y", y);
>  H5PartWriteDataFloat64(file, "z", z);
>  H5PartWriteDataFloat64(file, "px", px);
>  H5PartWriteDataFloat64(file, "py", py);
>  H5PartWriteDataFloat64(file, "pz", pz);
> }
> 
> H5PartCloseFile(file);
> 
> 
> Hope that helps,
> Mark
> 
> 
> On Thu, Mar 24, 2011 at 3:35 PM, Pierre de Buyl
> <[email protected]> wrote:
> > Hello,
> >
> > I would like to make an additional suggestion.
> >
> > With some colleagues, we set on to devise a specification on how a HDF5
> > should
> > be laid out for data of particle-based simulations. The specification is
> > called
> > H5MD and is found here: http://research.colberg.org/projects/molsim/
> >
> > This is, for now, only a specification and not a library, but I think that
> > it
> > provides a good basis for molecular simulations while being useful to other
> > kind
> > of simulations.
> >
> > To handle varying number of particles, it is possible to store the data in a
> > [T][N][D] dataset (T is the number of timesteps, N the number of particles
> > and D
> > the number of spatial dimensions.) in which a chunk size is defined along
> > the
> > particle-wise axis. That way, you can take N to be N_max, the maximum number
> > of
> > particles, and the space taken on disk will be zero for the non-written-to
> > chunks.
> >
> > I hope it helps and welcome comments!
> >
> > Pierre de Buyl
> >
> >
> >> Wed, 16 Mar 2011 07:15:00 -0700
> >> Yngve,
> >>  especially if the number of particles might change over time, using 1D
> >> arrays might be more appropriate, possibly combined with index lookup
> >> arrays
> >> that allows to identify particles at T0 to T1 and nice versa. I'm using
> >> such a 1D layout for particles and particle trajectories as part of my
> >> F5 library, here is a coding example on how to write particle positions
> >> with some fields given on them (it's all HDF5-based):
> >>  http://svn.origo.ethz.ch/wsvn/f5/doc/Particles_8c-example.html
> >> It's inefficient only very few particles because the overhead on
> >> the metadata structure is then more prominent, but for millions
> >> of particles that would be well. I haven't tried this structure yet
> >> with a million timesteps, which would lead to a million groups then.
> >> I would assume HDF5 is able to handle such a situation well, but
> >> it could make sense to bundle groups of similar timesteps hierarchically,
> >> too.
> >>
> >> On Wed, 16 Mar 2011 08:29:27 -0500, Yngve Inntjore Levinsen :
> >>
> >>> Yes of course Francesc, I was thinking float = half of 64bit instead of
> >>> 4x 8bit :) I was thinking that
> >>> it might be beneficial to keep the size in powers of 2, so that is why I
> >>> chose 1024 and not 1000. I keep
> >>> it as a variable so I can easily change it.
> >>> Werner, I was thinking that I should eventually move to a sequence of 1D
> >>> arrays, but it requires
> >>> slightly more rewriting. The number of lines I have to write depends on
> >>> whether or not the particle is
> >>> still alive. I am starting out with an equal amount of particles, but
> >>> have no means to know if I need to
> >>> write the position of a given particle 0 times or one million times.
> >>> Typically I have something like 1
> >>> million timesteps, but I do not write down trajectories all the time
> >>> (when is dependent on the Monte
> >>> Carlo so no way to know in advance)
> >>> Ideally I would've written all analysis into the code itself so I didn't
> >>> have to write the trajectories
> >>> all the time (I have not made this choice!), but that requires too much
> >>> work for me to handle at the
> >>> moment. Using HDF5 will reduce the storage space needed by about a factor
> >>> 6 from my estimates, improve
> >>> precision, and significantly reduce CPU hours needed as well. This is
> >>> already a great improvement!
> >>> Cheers,
> >>> Yngve
> >>> On Wednesday 16 March 2011 02:09:36 PM Werner Benger wrote:
> >>>
> >>>> Hi,
> >>>> what's the reason for using a 2D extendable dataset instead of a
> >>>> sequence
> >>>> of 1D arrays
> >>>> in a group, using one group per time step? How many particles and time
> >>>> steps do you
> >>>> have typically? I assume in your case the number of particles is
> >>>> constant
> >>>> over time?
> >>>> Cheers,
> >>>>        Werner
> >>>> On Wed, 16 Mar 2011 03:52:10 -0500, Yngve Inntjore Levinsen
> >>>> <> wrote:
> >>>> > Dear hierarchical people,
> >>>> >
> >>>> > I have currently converted a piece of code from using a simple ascii
> >>>> > format for output into using HDF5. What the code does is at every
> >>>> > iteration dumping some information about particle
> >>>> > energy/trajectory/position to the ascii file (this is a particle
> >>>> > tracking code).
> >>>> >
> >>>> > Initially I then did the same with the HDF5 library, having a
> >>>> > unlimited
> >>>> > row dimension in a 2D array and using h5extend_f to extend by one
> >>>> > element each time and writing a hyperslab of one row to the file. As
> >>>> > some (perhaps most) of you might have guessed or know already, this
> >>>> > was
> >>>> > a rather bad idea. The file (without compression) was about the same
> >>>> > size as the ascii file (but obviously with higher precision), and
> >>>> > reading the file in subsequent analysis was at least an order of
> >>>> > magnitude slower.
> >>>> >
> >>>> > I then realized that I probably needed to write less frequently and
> >>>> > rather keeping a semi-large hyperslab in memory. I chose a hyperslab
> >>>> > of > 1000 rows, but otherwise
> >>>> using the same procedure. This seems to be both
> >>>> > fast and with compression creating quite a bit smaller file. I tried
> >>>> > even larger slabs, but did not see any speed improvement in my initial
> >>>> > testing
> >>>> >
> >>>> > My question really was just if there are some recommended ways to do
> >>>> > this? I would imagine I am not the first that want to use HDF5 in this
> >>>> > way, dumping some data at every iteration of a given simulation,
> >>>> > without
> >>>> > having to keep it all in memory until the end?
> >>>> >
> >>>> > Thanks for all explanations/suggestions/experiences related to this
> >>>> > problem you can provide me so I can make the best design choices in my
> >>>> > program! :)
> >>>> >
> >>>> > Cheers,
> >>>> > Yngve
> >>>>
> >>>>
> >
> >
> > -----------------------------------------------------------
> > Pierre de Buyl
> > Physique des Systèmes Complexes et Mécanique Statistique - Université Libre
> > de Bruxelles
> > Chemical Physics Theory Group - University of Toronto
> > web: http://homepages.ulb.ac.be/~pdebuyl/
> > -----------------------------------------------------------
> >
> >
> > _______________________________________________
> > Hdf-forum is for HDF software users discussion.
> > [email protected]
> > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> >
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> 

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] dumping particle trajectories

Reply via email to