[Hdf-forum] hdf5 file layout and quick access to metadata

Leigh Orf Wed, 06 Jul 2011 08:24:39 -0700

Lately I am using the buffered write (backing store) of hdf5 to write
multiple time levels to hdf5 files. Each time level is its own group (zero
padded character string) and 3D floating point variables are members of each
groups.


My concern - perhaps unfounded - is that the very small bits of what I call
metadata (integers, lists of what variables are in the file, and other very
small bits of data I write which describe stuff like the 3D data and is
necessary for my reader code) will be placed after the huge 3d data such
that accessing it will require long seeks through 3d data. The only reason I
am worried about this is I noticed doing a h5dump on one of my small
metadata datasets that it took more than 10 seconds to output data on one of
my files. I got the impression that perhaps h5dump was having to make its
way through the 3d arrays before getting to the metadata. However, my C code
seemed to access the metadata quickly; perhaps it's an issue with h5dump.

So I guess my question is, should I not worry about things like what order
data is written to the hdf5 file and assume that the layout is intelligent
enough such that small structures/arrays/integers etc. will be accessible
quickly? If not, how do I force the small stuff to be at the beginning of
the file so it's quickly accessible? I will be looking at thousands of
files, each of which is tens of GB in size and may have dozens of groups
(each of which has dozens of 3D floating point arrays), so I am looking for
all ways to squeeze the fastest I/O I can.

Leigh

-- 
Leigh Orf
Associate Professor of Atmospheric Science
Department of Earth and Atmospheric Science
Central Michigan University

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

[Hdf-forum] hdf5 file layout and quick access to metadata

Reply via email to