Quincey Koziol wrote onĀ 2011-03-10: > Hi Andy, > > On Mar 9, 2011, at 11:15 AM, Salnikov, Andrei A. wrote: > >> >> Quincey Koziol wrote on 2011-03-09: >>> Hi Andy, >>> >>> On Mar 8, 2011, at 7:09 PM, Salnikov, Andrei A. wrote: >>> >>>> Hi, >>>> >>>> I'm trying to understand a performance hit that we are >>>> experiencing trying to examine the tree structure of >>>> our HDF5 files. Originally we observed problem when >>>> using h5py but it could be reproduced even with h5ls >>>> command. I tracked it down to a significant delay in >>>> the call to H5Oget_info_by_name function on a dataset >>>> with a large number of chunks. It looks like when the >>>> number of chunks in dataset increases (in our case >>>> we have 1-10k chunks) the performance of the H5Oget_info >>>> drops significantly. Looking at the IO statistics it >>>> seems that HDF5 library does very many small IO operations >>>> in this case. There is very little CPU spent, but real >>>> time is measured in tens of seconds. >>>> >>>> Is this an expected behavior? Can it be improved somehow >>>> without reducing the number of chunks drastically? >>>> >>>> One more comment about H5Oget_info - it returns a >>>> structure that contains a lot of different info. >>>> In the case of h5py code the only member of the >>>> structure used in the code is "type". could there be >>>> more efficient way to determine just the type of the >>>> object without requiring every other piece of info? >>> >>> Ah, yes, we've noticed that in some of the applications we've worked >>> with also (including some of the main HDF5 tools, like h5ls, etc). As >>> you say, H5Oget_info() is fairly heavyweight, getting all sorts of >>> information about each object. I do think a lighter-weight call like >>> "H5Oget_type" would be useful. Is there other "lightweight" >>> information that people would like back for each object? >>> >>> Quincey >>> >> >> Hi Quincey, >> >> thanks for confirming this. Could you explain briefly what is >> going on there and which part of H5O_info_t needs so many reads? > > The H5Oget_info() call is gathering information about the amount of > space that the metadata for the dataset is using. When there's a large > B- tree for indexing the chunks, that can take a fair bit of time to > walk the B-tree. > >> Maybe removing heavyweight info from H5O_info_t is the right >> thing to do, or creating another version of H5O_info_t structure >> which has only light-weight info? > > I'm leaning toward another light-weight version. I'm asking the > HDF5 community to help me decide what goes into that structure besides the > object type. >
Hi Quincey, is there a chance we can get this new version in the next release? Cheers, Andy _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
