[Hdf-forum] Indexing and fixed number of groups

Matthew Turk Wed, 23 Nov 2011 08:18:13 -0800

Hi all,

I'm a long time user of HDF5 (mostly via the Enzo project), but new to
optimizing and attempting to really take advantage of the features of
the library.  We're seeing substantial performance problems at the
present, and we're attempting to narrow it down.  As a bit of
background, the data in our files is structured such that we have
top-level groups (of the format /Grid00000001 , /Grid00000002 , etc)
and then off of each group hang a fixed number of datasets, and the
files themselves are write-once read-many.  We're in the position that
we know in advance exactly how many groups we have, how many datasets
hang off of each group (or at least a reasonable upper bound) and all
of our data is streamed exactly out with no chunking.


What we've found lately is that about 30% of the time to read a
dataset in occurs just in opening the individual grids.  The remainder
is the actual calls to read the data.  My naive guess at the source of
this behavior is that the opening of the groups involves reading a
potentially distributed index.  Because of our particular situation --
a fixed number of groups and datasets and inviolate data-on-disk -- is
there a particular mechanism or parameter we could set by which we
could speed up access to the groups and datasets?

Thanks for any ideas,

Matt

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

[Hdf-forum] Indexing and fixed number of groups

Reply via email to