Hi Andy,
On Mar 8, 2011, at 7:09 PM, Salnikov, Andrei A. wrote:
> Hi,
>
> I'm trying to understand a performance hit that we are
> experiencing trying to examine the tree structure of
> our HDF5 files. Originally we observed problem when
> using h5py but it could be reproduced even with h5ls
> command. I tracked it down to a significant delay in
> the call to H5Oget_info_by_name function on a dataset
> with a large number of chunks. It looks like when the
> number of chunks in dataset increases (in our case
> we have 1-10k chunks) the performance of the H5Oget_info
> drops significantly. Looking at the IO statistics it
> seems that HDF5 library does very many small IO operations
> in this case. There is very little CPU spent, but real
> time is measured in tens of seconds.
>
> Is this an expected behavior? Can it be improved somehow
> without reducing the number of chunks drastically?
>
> One more comment about H5Oget_info - it returns a
> structure that contains a lot of different info.
> In the case of h5py code the only member of the
> structure used in the code is "type". could there be
> more efficient way to determine just the type of the
> object without requiring every other piece of info?
Ah, yes, we've noticed that in some of the applications we've worked
with also (including some of the main HDF5 tools, like h5ls, etc). As you say,
H5Oget_info() is fairly heavyweight, getting all sorts of information about
each object. I do think a lighter-weight call like "H5Oget_type" would be
useful. Is there other "lightweight" information that people would like back
for each object?
Quincey
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org