Hi all, I'm a long time user of HDF5 (mostly via the Enzo project), but new to optimizing and attempting to really take advantage of the features of the library. We're seeing substantial performance problems at the present, and we're attempting to narrow it down. As a bit of background, the data in our files is structured such that we have top-level groups (of the format /Grid00000001 , /Grid00000002 , etc) and then off of each group hang a fixed number of datasets, and the files themselves are write-once read-many. We're in the position that we know in advance exactly how many groups we have, how many datasets hang off of each group (or at least a reasonable upper bound) and all of our data is streamed exactly out with no chunking.
What we've found lately is that about 30% of the time to read a dataset in occurs just in opening the individual grids. The remainder is the actual calls to read the data. My naive guess at the source of this behavior is that the opening of the groups involves reading a potentially distributed index. Because of our particular situation -- a fixed number of groups and datasets and inviolate data-on-disk -- is there a particular mechanism or parameter we could set by which we could speed up access to the groups and datasets? Thanks for any ideas, Matt _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
