Hi all, I have not received any response to my last question to Neil. Does anybody else know what is the status of this new feature (H5Oget_info2 and H5Oget_info_by_idx2) and if there a chance that we see it any time soon in released code?
Cheers, Andy Salnikov, Andrei A. wrote on 2011-09-07: > Hi Neil, > > just reviving this old thread to see if there was a progress > on this feature. Do you have an update on the status for > the upcoming 1.8.8 release? > > Thanks, > Andy > > > Neil Fortner wrote on 2011-03-21: >> Andy, >> >> On 03/20/2011 02:28 AM, Salnikov, Andrei A. wrote: >>> Neil Fortner wrote on 2011-03-14: >>>> Andy, >>>> >>>> On 03/11/2011 06:48 PM, Salnikov, Andrei A. wrote: >>>>> Quincey Koziol wrote on 2011-03-10: >>>>>> Hi Andy, >>>>>> >>>>>> On Mar 9, 2011, at 11:15 AM, Salnikov, Andrei A. wrote: >>>>>> >>>>>>> Quincey Koziol wrote on 2011-03-09: >>>>>>>> Hi Andy, >>>>>>>> >>>>>>>> On Mar 8, 2011, at 7:09 PM, Salnikov, Andrei A. wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I'm trying to understand a performance hit that we are >>>>>>>>> experiencing trying to examine the tree structure of >>>>>>>>> our HDF5 files. Originally we observed problem when >>>>>>>>> using h5py but it could be reproduced even with h5ls >>>>>>>>> command. I tracked it down to a significant delay in >>>>>>>>> the call to H5Oget_info_by_name function on a dataset >>>>>>>>> with a large number of chunks. It looks like when the >>>>>>>>> number of chunks in dataset increases (in our case >>>>>>>>> we have 1-10k chunks) the performance of the H5Oget_info >>>>>>>>> drops significantly. Looking at the IO statistics it >>>>>>>>> seems that HDF5 library does very many small IO operations >>>>>>>>> in this case. There is very little CPU spent, but real >>>>>>>>> time is measured in tens of seconds. >>>>>>>>> >>>>>>>>> Is this an expected behavior? Can it be improved somehow >>>>>>>>> without reducing the number of chunks drastically? >>>>>>>>> >>>>>>>>> One more comment about H5Oget_info - it returns a >>>>>>>>> structure that contains a lot of different info. >>>>>>>>> In the case of h5py code the only member of the >>>>>>>>> structure used in the code is "type". could there be >>>>>>>>> more efficient way to determine just the type of the >>>>>>>>> object without requiring every other piece of info? >>>>>>>> Ah, yes, we've noticed that in some of the applications we've >>>>>>>> worked with also (including some of the main HDF5 tools, like >>>>>>>> h5ls, etc). As you say, H5Oget_info() is fairly heavyweight, >>>>>>>> getting all sorts of information about each object. I do think a >>>>>>>> lighter- weight call like "H5Oget_type" would be useful. Is >>>>>>>> there other "lightweight" information that people would like back >>>>>>>> for each object? >>>>>>>> >>>>>>>> Quincey >>>>>>>> >>>>>>> Hi Quincey, >>>>>>> >>>>>>> thanks for confirming this. Could you explain briefly what is >>>>>>> going on there and which part of H5O_info_t needs so many reads? >>>>>> The H5Oget_info() call is gathering information about the amount >>>>>> of space that the metadata for the dataset is using. When there's >>>>>> a large B- tree for indexing the chunks, that can take a fair bit >>>>>> of time to walk the B-tree. >>>>>> >>>>>>> Maybe removing heavyweight info from H5O_info_t is the right >>>>>>> thing to do, or creating another version of H5O_info_t structure >>>>>>> which has only light-weight info? >>>>>> I'm leaning toward another light-weight version. I'm asking the >>>>>> HDF5 community to help me decide what goes into that structure >>>>>> besides the object type. >>>>>> >>>>> Hi Quincey, >>>>> >>>>> is there a chance we can get this new version in the next release? >>>> We actually already have an experimental branch with a similar >>>> feature mostly implemented. It allows you to specify the fields you >>>> want filled in by H5Oget_info. The branch can be found at: >>>> >>>> http://svn.hdfgroup.uiuc.edu/hdf5/branches/h5oget_info_by_field/ >>>> >>>> The new functions are: >>>> >>>> herr_t H5Oget_info2(hid_t loc_id, H5O_info_t *oinfo, unsigned >>>> fields); herr_t H5Oget_info_by_name2(hid_t loc_id, const char *name, >>>> H5O_info_t *oinfo, unsigned fields, hid_t lapl_id); >>>> >>>> The "fields" parameter can contain the following bitflags (combined >>>> with "|"): >>>> >>>> H5O_INFO_TIME H5O_INFO_NUM_ATTRS H5O_INFO_HDR H5O_INFO_META_SIZE >>>> H5O_INFO_ALL (==H5O_INFO_TIME | H5O_INFO_NUM_ATTRS | H5O_INFO_HDR | >>>> H5O_INFO_META_SIZE) >>>> >>>> Passing these flags tells the library to fill in the corresponding >>>> fields in oinfo. Other fields are always filled in because there is >>>> no performance penalty. In your case, since you only need the type, >>>> you can just pass "0". h5ls has also been modified to use these, so >>>> it should be faster. >>>> >>>> Of course, this is experimental code and should not be used in >>>> production, but if you're curious how much a lightweight H5Oget_info >>>> would help your performance you're welcome to try it. If you do, >>>> we'd love to hear about your results, and also your thoughts on the >>>> interface. For maximum performance, you should configure the library >>>> with "--enable-production" (for this branch, not necessary for >>>> releases). >>>> >>>> Thanks, >>>> -Neil >>>> >>> Hi Neil, >>> >>> I managed to build this branch and test it. It has indeed improved >>> performance dramatically. As you suggest I only use zero value for the >>> fields argument, other values have not been included in my test. With >>> that value and checking only the "type" field in H5O_info_t it runs >>> much faster than previous version.'h5ls' also works better on our >>> files. >>> >>> What I find interesting is a missing version for H5Oget_info_by_idx >>> which would take "fields" argument. Is this function so much different >>> from H5Oget_info and H5Oget_info_by_name so it cannot be optimized? >>> >>> Even without H5Oget_info_by_idx2 I'd be happy to see this branch >>> included into next release. >> >> Glad to hear it improved your performance! It would be easy to add >> H5Oget_info_by_idx2, we just didn't do that because we only did the >> minimum needed to test the performance in the case we were looking at, >> and stopped after reaching that point. We shelved the work because it >> didn't make a huge difference in the case we were looking at, but with >> your report I will look into getting it scheduled sooner rather than >> later. There is a chance we may change the interface to something like >> what Quincey suggested. Thanks for taking the time to test this! >> >> -Neil >> >>> Cheers, >>> Andy >>> >>> >>> _______________________________________________ >>> Hdf-forum is for HDF software users discussion. >>> [email protected] >>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org >> >> _______________________________________________ Hdf-forum is for HDF >> software users discussion. [email protected] >> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org > > > > > _______________________________________________ Hdf-forum is for HDF > software users discussion. [email protected] > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
