Andrei, On Oct 14, 2011, at 11:19 AM, Salnikov, Andrei A. wrote:
> Hi all, > > I have not received any response to my last question to Neil. > Does anybody else know what is the status of this new feature > (H5Oget_info2 and H5Oget_info_by_idx2) and if there a chance > that we see it any time soon in released code? > Sorry, it will not be included in this upcoming release (HDF5 1.8.8). We will let you know as soon as it is merged into the development branch (1.9). If the feature doesn't require a file format extension, it has a chance to be included in the future maintenance release. Elena > Cheers, > Andy > > > Salnikov, Andrei A. wrote on 2011-09-07: >> Hi Neil, >> >> just reviving this old thread to see if there was a progress >> on this feature. Do you have an update on the status for >> the upcoming 1.8.8 release? >> >> Thanks, >> Andy >> >> >> Neil Fortner wrote on 2011-03-21: >>> Andy, >>> >>> On 03/20/2011 02:28 AM, Salnikov, Andrei A. wrote: >>>> Neil Fortner wrote on 2011-03-14: >>>>> Andy, >>>>> >>>>> On 03/11/2011 06:48 PM, Salnikov, Andrei A. wrote: >>>>>> Quincey Koziol wrote on 2011-03-10: >>>>>>> Hi Andy, >>>>>>> >>>>>>> On Mar 9, 2011, at 11:15 AM, Salnikov, Andrei A. wrote: >>>>>>> >>>>>>>> Quincey Koziol wrote on 2011-03-09: >>>>>>>>> Hi Andy, >>>>>>>>> >>>>>>>>> On Mar 8, 2011, at 7:09 PM, Salnikov, Andrei A. wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I'm trying to understand a performance hit that we are >>>>>>>>>> experiencing trying to examine the tree structure of >>>>>>>>>> our HDF5 files. Originally we observed problem when >>>>>>>>>> using h5py but it could be reproduced even with h5ls >>>>>>>>>> command. I tracked it down to a significant delay in >>>>>>>>>> the call to H5Oget_info_by_name function on a dataset >>>>>>>>>> with a large number of chunks. It looks like when the >>>>>>>>>> number of chunks in dataset increases (in our case >>>>>>>>>> we have 1-10k chunks) the performance of the H5Oget_info >>>>>>>>>> drops significantly. Looking at the IO statistics it >>>>>>>>>> seems that HDF5 library does very many small IO operations >>>>>>>>>> in this case. There is very little CPU spent, but real >>>>>>>>>> time is measured in tens of seconds. >>>>>>>>>> >>>>>>>>>> Is this an expected behavior? Can it be improved somehow >>>>>>>>>> without reducing the number of chunks drastically? >>>>>>>>>> >>>>>>>>>> One more comment about H5Oget_info - it returns a >>>>>>>>>> structure that contains a lot of different info. >>>>>>>>>> In the case of h5py code the only member of the >>>>>>>>>> structure used in the code is "type". could there be >>>>>>>>>> more efficient way to determine just the type of the >>>>>>>>>> object without requiring every other piece of info? >>>>>>>>> Ah, yes, we've noticed that in some of the applications we've >>>>>>>>> worked with also (including some of the main HDF5 tools, like >>>>>>>>> h5ls, etc). As you say, H5Oget_info() is fairly heavyweight, >>>>>>>>> getting all sorts of information about each object. I do think a >>>>>>>>> lighter- weight call like "H5Oget_type" would be useful. Is >>>>>>>>> there other "lightweight" information that people would like back >>>>>>>>> for each object? >>>>>>>>> >>>>>>>>> Quincey >>>>>>>>> >>>>>>>> Hi Quincey, >>>>>>>> >>>>>>>> thanks for confirming this. Could you explain briefly what is >>>>>>>> going on there and which part of H5O_info_t needs so many reads? >>>>>>> The H5Oget_info() call is gathering information about the amount >>>>>>> of space that the metadata for the dataset is using. When there's >>>>>>> a large B- tree for indexing the chunks, that can take a fair bit >>>>>>> of time to walk the B-tree. >>>>>>> >>>>>>>> Maybe removing heavyweight info from H5O_info_t is the right >>>>>>>> thing to do, or creating another version of H5O_info_t structure >>>>>>>> which has only light-weight info? >>>>>>> I'm leaning toward another light-weight version. I'm asking the >>>>>>> HDF5 community to help me decide what goes into that structure >>>>>>> besides the object type. >>>>>>> >>>>>> Hi Quincey, >>>>>> >>>>>> is there a chance we can get this new version in the next release? >>>>> We actually already have an experimental branch with a similar >>>>> feature mostly implemented. It allows you to specify the fields you >>>>> want filled in by H5Oget_info. The branch can be found at: >>>>> >>>>> http://svn.hdfgroup.uiuc.edu/hdf5/branches/h5oget_info_by_field/ >>>>> >>>>> The new functions are: >>>>> >>>>> herr_t H5Oget_info2(hid_t loc_id, H5O_info_t *oinfo, unsigned >>>>> fields); herr_t H5Oget_info_by_name2(hid_t loc_id, const char *name, >>>>> H5O_info_t *oinfo, unsigned fields, hid_t lapl_id); >>>>> >>>>> The "fields" parameter can contain the following bitflags (combined >>>>> with "|"): >>>>> >>>>> H5O_INFO_TIME H5O_INFO_NUM_ATTRS H5O_INFO_HDR H5O_INFO_META_SIZE >>>>> H5O_INFO_ALL (==H5O_INFO_TIME | H5O_INFO_NUM_ATTRS | H5O_INFO_HDR | >>>>> H5O_INFO_META_SIZE) >>>>> >>>>> Passing these flags tells the library to fill in the corresponding >>>>> fields in oinfo. Other fields are always filled in because there is >>>>> no performance penalty. In your case, since you only need the type, >>>>> you can just pass "0". h5ls has also been modified to use these, so >>>>> it should be faster. >>>>> >>>>> Of course, this is experimental code and should not be used in >>>>> production, but if you're curious how much a lightweight H5Oget_info >>>>> would help your performance you're welcome to try it. If you do, >>>>> we'd love to hear about your results, and also your thoughts on the >>>>> interface. For maximum performance, you should configure the library >>>>> with "--enable-production" (for this branch, not necessary for >>>>> releases). >>>>> >>>>> Thanks, >>>>> -Neil >>>>> >>>> Hi Neil, >>>> >>>> I managed to build this branch and test it. It has indeed improved >>>> performance dramatically. As you suggest I only use zero value for the >>>> fields argument, other values have not been included in my test. With >>>> that value and checking only the "type" field in H5O_info_t it runs >>>> much faster than previous version.'h5ls' also works better on our >>>> files. >>>> >>>> What I find interesting is a missing version for H5Oget_info_by_idx >>>> which would take "fields" argument. Is this function so much different >>>> from H5Oget_info and H5Oget_info_by_name so it cannot be optimized? >>>> >>>> Even without H5Oget_info_by_idx2 I'd be happy to see this branch >>>> included into next release. >>> >>> Glad to hear it improved your performance! It would be easy to add >>> H5Oget_info_by_idx2, we just didn't do that because we only did the >>> minimum needed to test the performance in the case we were looking at, >>> and stopped after reaching that point. We shelved the work because it >>> didn't make a huge difference in the case we were looking at, but with >>> your report I will look into getting it scheduled sooner rather than >>> later. There is a chance we may change the interface to something like >>> what Quincey suggested. Thanks for taking the time to test this! >>> >>> -Neil >>> >>>> Cheers, >>>> Andy >>>> >>>> >>>> _______________________________________________ >>>> Hdf-forum is for HDF software users discussion. >>>> [email protected] >>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org >>> >>> _______________________________________________ Hdf-forum is for HDF >>> software users discussion. [email protected] >>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org >> >> >> >> >> _______________________________________________ Hdf-forum is for HDF >> software users discussion. [email protected] >> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org > > > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
