Re: [Hdf-forum] Scalability/Speed issues using H5DRead

Elena Pourmal Sat, 08 Sep 2012 19:59:17 -0700

Hi Malcom, 

Doesn't sound good ;-). Would it be possible to submit a program that 
demonstrates the issue to [email protected], so we can take a look?


Thank you!

Elena
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal  The HDF Group  http://hdfgroup.org   
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



On Sep 8, 2012, at 3:31 AM, Malcolm MacLeod wrote:

> Hello Elena,
> 
> Sorry I should have mentioned that, I am already setting H5F_LIBVER_LATEST 
> and 
> have recreated the file (which is what gave the slight speed boost I 
> mentioned 
> originally when upgrading) but the same issue is unfortunately still present.
> 
> - Malcolm
> 
> 
>> Malcolm,
>> 
>> Please try to use the latest file format when you create a file. It should
>> be more efficient in handling groups with a big number of objects.
>> 
>> See the H5Pset_libver_bounds function
>> (http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetLibverBounds);
>> use H5F_LIBVER_LATEST for the last two parameters.
>> 
>> You may repack an existing file with h5repack using -L flag.
>> 
>> Elena
>> 
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> Elena Pourmal  The HDF Group  http://hdfgroup.org
>> 1800 So. Oak St., Suite 203, Champaign IL 61820
>> 217.531.6112
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> 
>> On Sep 5, 2012, at 4:25 AM, Malcolm MacLeod wrote:
>>> Hello,
>>> 
>>> Our software has for a long time made use of the HDF5 library without any
>>> issues. Recently we have started to run into datasets far larger than wh
>>> at
>>> was previously used and some scalability issues appear to be showing.
>>> 
>>> The HDF5 file in question contains a single group with many datasets - A
>>> specific piece of code opens every dataset one at a time and reads from it
>>> via H5DRead.
>>> 
>>> Previously it was rare to have more than ~90000 datasets here so this was
>>> never noticed - but after H5DRead has been called about ~60000 times
>>> subsequent calls appear to start to become increasingly slow, by about
>>> ~80000 calls it slows to a crawl (instead of processing 1000s a second it
>>> is processing only two or three per second)
>>> 
>>> I have tried upgrading from 1.8.8 -> 1.8.9 and this seems to have helped
>>> slightly, it now becomes unbearable at around ~100000 instead of ~80000
>>> calls.
>>> 
>>> 
>>> Some observations:
>>> 1) This does not appear to be due to a seek delay or (larger datasets in
>>> the middle) or anything like that, I have tried e.g. starting at the back
>>> of a group of ~500000 datasets instead of the front and the same thing
>>> happens. I have tried also to start in various spots towards the middle
>>> and also the same behaviour can be observed.
>>> 2) If I cancel the loop, allow the software to idle for a while and then
>>> give it another go the same thing happens (it is fast again until a
>>> certain quantity of reads) - so it appears that HDF5 may be doing
>>> something in the background once it is not busy that allows reads to be
>>> fast again?
>>> 
>>> 
>>> I would greatly appreciate any thoughts on this or ideas as to what might
>>> be going on?
>>> 
>>> Regards,
>>> Malcolm MacLeod
>>> 
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> [email protected]
>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] Scalability/Speed issues using H5DRead

Reply via email to