Hi, Thanks to both of you for very interesting suggestions which I am quite sure I wouldn't have found on my own! I'll look into both of them as soon as I have time.
There is another issue (this is "stupidity" of the code, not HDF5) that it actually only tracks 64 particles at the time. I think it was some kind of memory limitations back in the days (the code is still F77 for the most part). That means that I don't actually have all particles at the given timestep at the same time in the simulation, so I get.. well many dimensions quite quickly. I'll keep you posted about my outcome! :) Cheers, Yngve On Thursday 24 March 2011 08:51:43 PM Mark Howison wrote: > .... and I'll throw in one more suggestion, the H5Part library: > > http://vis.lbl.gov/Research/H5Part/ > > which allows you to quickly and easily dump out particle data into an HDF5 > file. > > The data model is the same one Werner suggested: each timestep has its > own group, and the particles are stored as 1D arrays within those > groups. You can have different numbers of particles in each timestep. > > For each iteration, you would do something like: > > file = H5PartOpenFile("particles.h5", H5_O_WRONLY, MPI_COMM_WORLD); > > (for loop) { > H5PartSetStep(file, i); > H5PartSetNumParticles(file, nparticles); > H5PartWriteDataFloat64(file, "x", x); > H5PartWriteDataFloat64(file, "y", y); > H5PartWriteDataFloat64(file, "z", z); > H5PartWriteDataFloat64(file, "px", px); > H5PartWriteDataFloat64(file, "py", py); > H5PartWriteDataFloat64(file, "pz", pz); > } > > H5PartCloseFile(file); > > > Hope that helps, > Mark > > > On Thu, Mar 24, 2011 at 3:35 PM, Pierre de Buyl > <[email protected]> wrote: > > Hello, > > > > I would like to make an additional suggestion. > > > > With some colleagues, we set on to devise a specification on how a HDF5 > > should > > be laid out for data of particle-based simulations. The specification is > > called > > H5MD and is found here: http://research.colberg.org/projects/molsim/ > > > > This is, for now, only a specification and not a library, but I think that > > it > > provides a good basis for molecular simulations while being useful to other > > kind > > of simulations. > > > > To handle varying number of particles, it is possible to store the data in a > > [T][N][D] dataset (T is the number of timesteps, N the number of particles > > and D > > the number of spatial dimensions.) in which a chunk size is defined along > > the > > particle-wise axis. That way, you can take N to be N_max, the maximum number > > of > > particles, and the space taken on disk will be zero for the non-written-to > > chunks. > > > > I hope it helps and welcome comments! > > > > Pierre de Buyl > > > > > >> Wed, 16 Mar 2011 07:15:00 -0700 > >> Yngve, > >> especially if the number of particles might change over time, using 1D > >> arrays might be more appropriate, possibly combined with index lookup > >> arrays > >> that allows to identify particles at T0 to T1 and nice versa. I'm using > >> such a 1D layout for particles and particle trajectories as part of my > >> F5 library, here is a coding example on how to write particle positions > >> with some fields given on them (it's all HDF5-based): > >> http://svn.origo.ethz.ch/wsvn/f5/doc/Particles_8c-example.html > >> It's inefficient only very few particles because the overhead on > >> the metadata structure is then more prominent, but for millions > >> of particles that would be well. I haven't tried this structure yet > >> with a million timesteps, which would lead to a million groups then. > >> I would assume HDF5 is able to handle such a situation well, but > >> it could make sense to bundle groups of similar timesteps hierarchically, > >> too. > >> > >> On Wed, 16 Mar 2011 08:29:27 -0500, Yngve Inntjore Levinsen : > >> > >>> Yes of course Francesc, I was thinking float = half of 64bit instead of > >>> 4x 8bit :) I was thinking that > >>> it might be beneficial to keep the size in powers of 2, so that is why I > >>> chose 1024 and not 1000. I keep > >>> it as a variable so I can easily change it. > >>> Werner, I was thinking that I should eventually move to a sequence of 1D > >>> arrays, but it requires > >>> slightly more rewriting. The number of lines I have to write depends on > >>> whether or not the particle is > >>> still alive. I am starting out with an equal amount of particles, but > >>> have no means to know if I need to > >>> write the position of a given particle 0 times or one million times. > >>> Typically I have something like 1 > >>> million timesteps, but I do not write down trajectories all the time > >>> (when is dependent on the Monte > >>> Carlo so no way to know in advance) > >>> Ideally I would've written all analysis into the code itself so I didn't > >>> have to write the trajectories > >>> all the time (I have not made this choice!), but that requires too much > >>> work for me to handle at the > >>> moment. Using HDF5 will reduce the storage space needed by about a factor > >>> 6 from my estimates, improve > >>> precision, and significantly reduce CPU hours needed as well. This is > >>> already a great improvement! > >>> Cheers, > >>> Yngve > >>> On Wednesday 16 March 2011 02:09:36 PM Werner Benger wrote: > >>> > >>>> Hi, > >>>> what's the reason for using a 2D extendable dataset instead of a > >>>> sequence > >>>> of 1D arrays > >>>> in a group, using one group per time step? How many particles and time > >>>> steps do you > >>>> have typically? I assume in your case the number of particles is > >>>> constant > >>>> over time? > >>>> Cheers, > >>>> Werner > >>>> On Wed, 16 Mar 2011 03:52:10 -0500, Yngve Inntjore Levinsen > >>>> <> wrote: > >>>> > Dear hierarchical people, > >>>> > > >>>> > I have currently converted a piece of code from using a simple ascii > >>>> > format for output into using HDF5. What the code does is at every > >>>> > iteration dumping some information about particle > >>>> > energy/trajectory/position to the ascii file (this is a particle > >>>> > tracking code). > >>>> > > >>>> > Initially I then did the same with the HDF5 library, having a > >>>> > unlimited > >>>> > row dimension in a 2D array and using h5extend_f to extend by one > >>>> > element each time and writing a hyperslab of one row to the file. As > >>>> > some (perhaps most) of you might have guessed or know already, this > >>>> > was > >>>> > a rather bad idea. The file (without compression) was about the same > >>>> > size as the ascii file (but obviously with higher precision), and > >>>> > reading the file in subsequent analysis was at least an order of > >>>> > magnitude slower. > >>>> > > >>>> > I then realized that I probably needed to write less frequently and > >>>> > rather keeping a semi-large hyperslab in memory. I chose a hyperslab > >>>> > of > 1000 rows, but otherwise > >>>> using the same procedure. This seems to be both > >>>> > fast and with compression creating quite a bit smaller file. I tried > >>>> > even larger slabs, but did not see any speed improvement in my initial > >>>> > testing > >>>> > > >>>> > My question really was just if there are some recommended ways to do > >>>> > this? I would imagine I am not the first that want to use HDF5 in this > >>>> > way, dumping some data at every iteration of a given simulation, > >>>> > without > >>>> > having to keep it all in memory until the end? > >>>> > > >>>> > Thanks for all explanations/suggestions/experiences related to this > >>>> > problem you can provide me so I can make the best design choices in my > >>>> > program! :) > >>>> > > >>>> > Cheers, > >>>> > Yngve > >>>> > >>>> > > > > > > ----------------------------------------------------------- > > Pierre de Buyl > > Physique des Systèmes Complexes et Mécanique Statistique - Université Libre > > de Bruxelles > > Chemical Physics Theory Group - University of Toronto > > web: http://homepages.ulb.ac.be/~pdebuyl/ > > ----------------------------------------------------------- > > > > > > _______________________________________________ > > Hdf-forum is for HDF software users discussion. > > [email protected] > > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org > > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org > _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
