Quincey Koziol wrote onĀ 2011-03-10:
> Hi Andy,
> 
> On Mar 9, 2011, at 11:15 AM, Salnikov, Andrei A. wrote:
> 
>> 
>> Quincey Koziol wrote on 2011-03-09:
>>> Hi Andy,
>>> 
>>> On Mar 8, 2011, at 7:09 PM, Salnikov, Andrei A. wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I'm trying to understand a performance hit that we are
>>>> experiencing trying to examine the tree structure of
>>>> our HDF5 files. Originally we observed problem when
>>>> using h5py but it could be reproduced even with h5ls
>>>> command. I tracked it down to a significant delay in
>>>> the call to H5Oget_info_by_name function on a dataset
>>>> with a large number of chunks. It looks like when the
>>>> number of chunks in dataset increases (in our case
>>>> we have 1-10k chunks) the performance of the H5Oget_info
>>>> drops significantly. Looking at the IO statistics it
>>>> seems that HDF5 library does very many small IO operations
>>>> in this case. There is very little CPU spent, but real
>>>> time is measured in tens of seconds.
>>>> 
>>>> Is this an expected behavior? Can it be improved somehow
>>>> without reducing the number of chunks drastically?
>>>> 
>>>> One more comment about H5Oget_info - it returns a
>>>> structure that contains a lot of different info.
>>>> In the case of h5py code the only member of the
>>>> structure used in the code is "type". could there be
>>>> more efficient way to determine just the type of the
>>>> object without requiring every other piece of info?
>>> 
>>>     Ah, yes, we've noticed that in some of the applications we've worked
>>> with also (including some of the main HDF5 tools, like h5ls, etc).  As
>>> you say, H5Oget_info() is fairly heavyweight, getting all sorts of
>>> information about each object.  I do think a lighter-weight call like
>>> "H5Oget_type" would be useful.  Is there other "lightweight"
>>> information that people would like back for each object?
>>> 
>>>     Quincey
>>> 
>> 
>> Hi Quincey,
>> 
>> thanks for confirming this. Could you explain briefly what is
>> going on there and which part of H5O_info_t needs so many reads?
> 
>       The H5Oget_info() call is gathering information about the amount of
> space that the metadata for the dataset is using.  When there's a large
> B- tree for indexing the chunks, that can take a fair bit of time to
> walk the B-tree.
> 
>>  Maybe removing heavyweight info from H5O_info_t is the right
>> thing to do, or creating another version of H5O_info_t structure
>> which has only light-weight info?
> 
>       I'm leaning toward another light-weight version.  I'm asking the
> HDF5 community to help me decide what goes into that structure besides the
> object type.
> 

Hi Quincey,

is there a chance we can get this new version in the next release?

Cheers,
Andy


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to