Malcolm, Please try to use the latest file format when you create a file. It should be more efficient in handling groups with a big number of objects.
See the H5Pset_libver_bounds function (http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetLibverBounds); use H5F_LIBVER_LATEST for the last two parameters. You may repack an existing file with h5repack using -L flag. Elena ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Elena Pourmal The HDF Group http://hdfgroup.org 1800 So. Oak St., Suite 203, Champaign IL 61820 217.531.6112 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ On Sep 5, 2012, at 4:25 AM, Malcolm MacLeod wrote: > Hello, > > Our software has for a long time made use of the HDF5 library without any > issues. Recently we have started to run into datasets far larger than wh at > was previously used and some scalability issues appear to be showing. > > The HDF5 file in question contains a single group with many datasets - A > specific piece of code opens every dataset one at a time and reads from it > via > H5DRead. > > Previously it was rare to have more than ~90000 datasets here so this was > never noticed - but after H5DRead has been called about ~60000 times > subsequent calls appear to start to become increasingly slow, by about ~80000 > calls it slows to a crawl (instead of processing 1000s a second it is > processing only two or three per second) > > I have tried upgrading from 1.8.8 -> 1.8.9 and this seems to have helped > slightly, it now becomes unbearable at around ~100000 instead of ~80000 calls. > > > Some observations: > 1) This does not appear to be due to a seek delay or (larger datasets in the > middle) or anything like that, I have tried e.g. starting at the back of a > group of ~500000 datasets instead of the front and the same thing happens. I > have tried also to start in various spots towards the middle and also the > same > behaviour can be observed. > 2) If I cancel the loop, allow the software to idle for a while and then give > it another go the same thing happens (it is fast again until a certain > quantity of reads) - so it appears that HDF5 may be doing something in the > background once it is not busy that allows reads to be fast again? > > > I would greatly appreciate any thoughts on this or ideas as to what might be > going on? > > Regards, > Malcolm MacLeod > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
