Hi Andy,
On Mar 9, 2011, at 11:15 AM, Salnikov, Andrei A. wrote:
>
> Quincey Koziol wrote on 2011-03-09:
>> Hi Andy,
>>
>> On Mar 8, 2011, at 7:09 PM, Salnikov, Andrei A. wrote:
>>
>>> Hi,
>>>
>>> I'm trying to understand a performance hit that we are
>>> experiencing trying to examine the tree structure of
>>> our HDF5 files. Originally we observed problem when
>>> using h5py but it could be reproduced even with h5ls
>>> command. I tracked it down to a significant delay in
>>> the call to H5Oget_info_by_name function on a dataset
>>> with a large number of chunks. It looks like when the
>>> number of chunks in dataset increases (in our case
>>> we have 1-10k chunks) the performance of the H5Oget_info
>>> drops significantly. Looking at the IO statistics it
>>> seems that HDF5 library does very many small IO operations
>>> in this case. There is very little CPU spent, but real
>>> time is measured in tens of seconds.
>>>
>>> Is this an expected behavior? Can it be improved somehow
>>> without reducing the number of chunks drastically?
>>>
>>> One more comment about H5Oget_info - it returns a
>>> structure that contains a lot of different info.
>>> In the case of h5py code the only member of the
>>> structure used in the code is "type". could there be
>>> more efficient way to determine just the type of the
>>> object without requiring every other piece of info?
>>
>> Ah, yes, we've noticed that in some of the applications we've worked
>> with also (including some of the main HDF5 tools, like h5ls, etc). As you
>> say, H5Oget_info() is fairly heavyweight, getting all sorts of information
>> about each object. I do think a lighter-weight call like "H5Oget_type"
>> would be useful. Is there other "lightweight" information that people
>> would like back for each object?
>>
>> Quincey
>>
>
> Hi Quincey,
>
> thanks for confirming this. Could you explain briefly what is
> going on there and which part of H5O_info_t needs so many reads?
The H5Oget_info() call is gathering information about the amount of
space that the metadata for the dataset is using. When there's a large B-tree
for indexing the chunks, that can take a fair bit of time to walk the B-tree.
> Maybe removing heavyweight info from H5O_info_t is the right
> thing to do, or creating another version of H5O_info_t structure
> which has only light-weight info?
I'm leaning toward another light-weight version. I'm asking the HDF5
community to help me decide what goes into that structure besides the object
type.
Quincey
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org