On Wednesday 03 March 2010 13:43:38 Werner Benger wrote: > On Wed, 03 Mar 2010 11:47:18 +0100, Thorben Kröger <[email protected] heidelberg.de> wrote: > > Hello, > > We have a ~30GB HDF5 file with something like 100 million small datasets > > in it. We need to iterate through all of them, and doing so is very slow > > as each one has to be loaded from disk. I also don't know if it is > > possible to find out a proper ordering to go through them, so I suspect > > that there might also be a lot of disk seeks necessary. > > > > Maybe it isn't such a good idea to have so many small objects in the > > file, but I'm stuck with this format now. What options do I have? > > > > I'm working now on a machine with 128GB of RAM, so my file would fit > > comfortably inside. Is it possible to load the file completely into > > memory to avoid all of the above problems? > > Yes, having many small objects is inefficient. H5iterate() just traverses > all objects as found by name, which is alphabetical ordering, which may or > may not be what you want. If you want a particular order, you could create > a table with references/links to the specific datasets, and iterate over > this table instead. > > Are you using Linux as operating system? If so, then you can just copy > your 30GB file to > > /dev/shm > > which (usually) is a tmpfs, i.e. a ramdisk under Linux.
I also thought of using a ramdisk, but thought that you needed to have root rights for that. As it turns out, I can copy stuff to /dev/shm :-) Thanks, as a temporary solution this is really good :-) > > A more portable solution is to use the memory virtual file driver, > where you can read an HDF5 file from an RAM image. > > Werner _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
