On Wednesday 03 March 2010 13:43:38 Werner Benger wrote:
> On Wed, 03 Mar 2010 11:47:18 +0100, Thorben Kröger <[email protected]
heidelberg.de> wrote:
> > Hello,
> > We have a ~30GB HDF5 file with something like 100 million small datasets
> > in it. We need to iterate through all of them, and doing so is very slow
> > as each one has to be loaded from disk. I also don't know if it is
> > possible to find out a proper ordering to go through them, so I suspect
> > that there might also be a lot of disk seeks necessary.
> > 
> > Maybe it isn't such a good idea to have so many small objects in the
> > file, but I'm stuck with this format now. What options do I have?
> > 
> > I'm working now on a machine with 128GB of RAM, so my file would fit
> > comfortably inside. Is it possible to load the file completely into
> > memory to avoid all of the above problems?
> 
> Yes, having many small objects is inefficient. H5iterate() just traverses
> all objects as found by name, which is alphabetical ordering, which may or
> may not be what you want. If you want a particular order, you could create
> a table with references/links to the specific datasets, and iterate over
> this table instead.
> 
> Are you using Linux as operating system? If so, then you can just copy
> your 30GB file to
> 
>   /dev/shm
> 
> which (usually) is a tmpfs, i.e. a ramdisk under Linux.

I also thought of using a ramdisk, but thought that you needed to have root 
rights for that. As it turns out, I can copy stuff to /dev/shm :-)
Thanks, as a temporary solution this is really good :-)

> 
> A more portable solution is to use the memory virtual file driver,
> where you can read an HDF5 file from an RAM image.
> 
>       Werner

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to