Hi Matt,

On Nov 22, 2011, at 1:11 PM, Matthew Turk wrote:

> Hi all,
> 
> I'm a long time user of HDF5 (mostly via the Enzo project), but new to
> optimizing and attempting to really take advantage of the features of
> the library.  We're seeing substantial performance problems at the
> present, and we're attempting to narrow it down.  As a bit of
> background, the data in our files is structured such that we have
> top-level groups (of the format /Grid00000001 , /Grid00000002 , etc)
> and then off of each group hang a fixed number of datasets, and the
> files themselves are write-once read-many.  We're in the position that
> we know in advance exactly how many groups we have, how many datasets
> hang off of each group (or at least a reasonable upper bound) and all
> of our data is streamed exactly out with no chunking.
> 
> What we've found lately is that about 30% of the time to read a
> dataset in occurs just in opening the individual grids.  The remainder
> is the actual calls to read the data.  My naive guess at the source of
> this behavior is that the opening of the groups involves reading a
> potentially distributed index.  Because of our particular situation --
> a fixed number of groups and datasets and inviolate data-on-disk -- is
> there a particular mechanism or parameter we could set by which we
> could speed up access to the groups and datasets?

        There's no distributed index really, each group just has a heap with 
the link info in it and a B-tree that indexes them.  How large are the files 
you are accessing?  Are you using serial or parallel access to them?  What 
system/file system are you using?

        Quincey


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to