On a related note, I've just found this piece of information which might accelerate our program as well:
--- http://www.hdfgroup.org/HDF5/doc/ADGuide/CompatFormat180.html H5Pset_libver_bounds( hid_t fapl_id, H5F_libver_t low, H5F_libver_t high ) Compact-or-indexed groups enable much-compressed link storage for groups with very few members and improved efficiency and performance for groups with very large numbers of members. The efficiency and performance impacts are most noticeable at the extremes: all unnecessary overhead is eliminated for groups with zero members; groups with tens of thousands of members may see as much as a 100-fold performance gain. H5Pset_libver_bounds( hid_t fapl_id, H5F_libver_t low, H5F_libver_t high ) H5Pget_libver_bounds( hid_t fapl_id, H5F_libver_t* low, H5F_libver_t* high ) Default behavior: If H5Pset_libver_bounds is not called with low equal to HDF_LIBVER_LATEST, then the HDF5 Library provides the greatest-possible format compatibility. It does this by creating objects with the earliest opssible format that will handle the data being stored and accommodate the action being taken. --- Though the 30GB file I was talking of was written using HDF5 1.8.4, if I understand correctly, it will not make use of these new features because it tries to maintain downward compatibility to 1.6. Correct? Is there a tool available that converts an existing file to a new file version to make use of all these performance improvements? Or should I hack this into h5repack.c myself? I'd like to avoid that... Cheers, Thorben On Wednesday 03 March 2010 11:47:18 Thorben Kröger wrote: > Hello, > We have a ~30GB HDF5 file with something like 100 million small datasets in > it. We need to iterate through all of them, and doing so is very slow as > each one has to be loaded from disk. I also don't know if it is possible > to find out a proper ordering to go through them, so I suspect that there > might also be a lot of disk seeks necessary. > > Maybe it isn't such a good idea to have so many small objects in the file, > but I'm stuck with this format now. What options do I have? > > I'm working now on a machine with 128GB of RAM, so my file would fit > comfortably inside. Is it possible to load the file completely into memory > to avoid all of the above problems? > > Thanks, > Thorben > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
