On Wed, 03 Mar 2010 11:47:18 +0100, Thorben Kröger
<[email protected]> wrote:
Hello,
We have a ~30GB HDF5 file with something like 100 million small datasets in
it. We need to iterate through all of them, and doing so is very slow as each
one has to be loaded from disk. I also don't know if it is possible to find
out a proper ordering to go through them, so I suspect that there might also
be a lot of disk seeks necessary.
Maybe it isn't such a good idea to have so many small objects in the file, but
I'm stuck with this format now. What options do I have?
I'm working now on a machine with 128GB of RAM, so my file would fit
comfortably inside. Is it possible to load the file completely into memory to
avoid all of the above problems?
Yes, having many small objects is inefficient. H5iterate() just traverses
all objects as found by name, which is alphabetical ordering, which may or
may not be what you want. If you want a particular order, you could create
a table with references/links to the specific datasets, and iterate over
this table instead.
Are you using Linux as operating system? If so, then you can just copy
your 30GB file to
/dev/shm
which (usually) is a tmpfs, i.e. a ramdisk under Linux.
A more portable solution is to use the memory virtual file driver,
where you can read an HDF5 file from an RAM image.
Werner
--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org