Hi all,

I have not received any response to my last question to Neil.
Does anybody else know what is the status of this new feature 
(H5Oget_info2 and H5Oget_info_by_idx2) and if there a chance
that we see it any time soon in released code?

Cheers,
Andy


Salnikov, Andrei A. wrote on 2011-09-07:
> Hi Neil,
> 
> just reviving this old thread to see if there was a progress
> on this feature. Do you have an update on the status for
> the upcoming 1.8.8 release?
> 
> Thanks,
> Andy
> 
> 
> Neil Fortner wrote on 2011-03-21:
>> Andy,
>> 
>> On 03/20/2011 02:28 AM, Salnikov, Andrei A. wrote:
>>> Neil Fortner wrote on 2011-03-14:
>>>> Andy,
>>>> 
>>>> On 03/11/2011 06:48 PM, Salnikov, Andrei A. wrote:
>>>>> Quincey Koziol wrote on 2011-03-10:
>>>>>> Hi Andy,
>>>>>> 
>>>>>> On Mar 9, 2011, at 11:15 AM, Salnikov, Andrei A. wrote:
>>>>>> 
>>>>>>> Quincey Koziol wrote on 2011-03-09:
>>>>>>>> Hi Andy,
>>>>>>>> 
>>>>>>>> On Mar 8, 2011, at 7:09 PM, Salnikov, Andrei A. wrote:
>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> I'm trying to understand a performance hit that we are
>>>>>>>>> experiencing trying to examine the tree structure of
>>>>>>>>> our HDF5 files. Originally we observed problem when
>>>>>>>>> using h5py but it could be reproduced even with h5ls
>>>>>>>>> command. I tracked it down to a significant delay in
>>>>>>>>> the call to H5Oget_info_by_name function on a dataset
>>>>>>>>> with a large number of chunks. It looks like when the
>>>>>>>>> number of chunks in dataset increases (in our case
>>>>>>>>> we have 1-10k chunks) the performance of the H5Oget_info
>>>>>>>>> drops significantly. Looking at the IO statistics it
>>>>>>>>> seems that HDF5 library does very many small IO operations
>>>>>>>>> in this case. There is very little CPU spent, but real
>>>>>>>>> time is measured in tens of seconds.
>>>>>>>>> 
>>>>>>>>> Is this an expected behavior? Can it be improved somehow
>>>>>>>>> without reducing the number of chunks drastically?
>>>>>>>>> 
>>>>>>>>> One more comment about H5Oget_info - it returns a
>>>>>>>>> structure that contains a lot of different info.
>>>>>>>>> In the case of h5py code the only member of the
>>>>>>>>> structure used in the code is "type". could there be
>>>>>>>>> more efficient way to determine just the type of the
>>>>>>>>> object without requiring every other piece of info?
>>>>>>>>        Ah, yes, we've noticed that in some of the applications we've
>>>>>>>> worked with also (including some of the main HDF5 tools, like
>>>>>>>> h5ls, etc). As you say, H5Oget_info() is fairly heavyweight,
>>>>>>>> getting all sorts of information about each object.  I do think a
>>>>>>>> lighter- weight call like "H5Oget_type" would be useful.  Is
>>>>>>>> there other "lightweight" information that people would like back
>>>>>>>> for each object?
>>>>>>>> 
>>>>>>>>        Quincey
>>>>>>>> 
>>>>>>> Hi Quincey,
>>>>>>> 
>>>>>>> thanks for confirming this. Could you explain briefly what is
>>>>>>> going on there and which part of H5O_info_t needs so many reads?
>>>>>>  The H5Oget_info() call is gathering information about the amount
>>>>>> of space that the metadata for the dataset is using.  When there's
>>>>>> a large B- tree for indexing the chunks, that can take a fair bit
>>>>>> of time to walk the B-tree.
>>>>>> 
>>>>>>>    Maybe removing heavyweight info from H5O_info_t is the right
>>>>>>> thing to do, or creating another version of H5O_info_t structure
>>>>>>> which has only light-weight info?
>>>>>>  I'm leaning toward another light-weight version.  I'm asking the
>>>>>> HDF5 community to help me decide what goes into that structure
>>>>>> besides the object type.
>>>>>> 
>>>>> Hi Quincey,
>>>>> 
>>>>> is there a chance we can get this new version in the next release?
>>>> We actually already have an experimental branch with a similar
>>>> feature mostly implemented.  It allows you to specify the fields you
>>>> want filled in by H5Oget_info.  The branch can be found at:
>>>> 
>>>> http://svn.hdfgroup.uiuc.edu/hdf5/branches/h5oget_info_by_field/
>>>> 
>>>> The new functions are:
>>>> 
>>>> herr_t H5Oget_info2(hid_t loc_id, H5O_info_t *oinfo, unsigned
>>>> fields); herr_t H5Oget_info_by_name2(hid_t loc_id, const char *name,
>>>> H5O_info_t *oinfo, unsigned fields, hid_t lapl_id);
>>>> 
>>>> The "fields" parameter can contain the following bitflags (combined
>>>> with "|"):
>>>> 
>>>> H5O_INFO_TIME H5O_INFO_NUM_ATTRS H5O_INFO_HDR H5O_INFO_META_SIZE
>>>> H5O_INFO_ALL (==H5O_INFO_TIME | H5O_INFO_NUM_ATTRS | H5O_INFO_HDR |
>>>> H5O_INFO_META_SIZE)
>>>> 
>>>> Passing these flags tells the library to fill in the corresponding
>>>> fields in oinfo.  Other fields are always filled in because there is
>>>> no performance penalty.  In your case, since you only need the type,
>>>> you can just pass "0".  h5ls has also been modified to use these, so
>>>> it should be faster.
>>>> 
>>>> Of course, this is experimental code and should not be used in
>>>> production, but if you're curious how much a lightweight H5Oget_info
>>>> would help your performance you're welcome to try it.  If you do,
>>>> we'd love to hear about your results, and also your thoughts on the
>>>> interface.  For maximum performance, you should configure the library
>>>> with "--enable-production" (for this branch, not necessary for
>>>> releases).
>>>> 
>>>> Thanks,
>>>> -Neil
>>>> 
>>> Hi Neil,
>>> 
>>> I managed to build this branch and test it. It has indeed improved
>>> performance dramatically. As you suggest I only use zero value for the
>>> fields argument, other values have not been included in my test. With
>>> that value and checking only the "type" field in H5O_info_t it runs
>>> much faster than previous version.'h5ls' also works better on our
>>> files.
>>> 
>>> What I find interesting is a missing version for H5Oget_info_by_idx
>>> which would take "fields" argument. Is this function so much different
>>> from H5Oget_info and H5Oget_info_by_name so it cannot be optimized?
>>> 
>>> Even without H5Oget_info_by_idx2 I'd be happy to see this branch
>>> included into next release.
>> 
>> Glad to hear it improved your performance!  It would be easy to add
>> H5Oget_info_by_idx2, we just didn't do that because we only did the
>> minimum needed to test the performance in the case we were looking at,
>> and stopped after reaching that point.  We shelved the work because it
>> didn't make a huge difference in the case we were looking at, but with
>> your report I will look into getting it scheduled sooner rather than
>> later.  There is a chance we may change the interface to something like
>> what Quincey suggested.  Thanks for taking the time to test this!
>> 
>> -Neil
>> 
>>> Cheers,
>>> Andy
>>> 
>>> 
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> [email protected]
>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>> 
>> _______________________________________________ Hdf-forum is for HDF
>> software users discussion. [email protected]
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> 
> 
> 
> 
> _______________________________________________ Hdf-forum is for HDF
> software users discussion. [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org




_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to