[Hdf-forum] Scalability/Speed issues using H5DRead

Malcolm MacLeod Wed, 05 Sep 2012 05:42:54 -0700

Hello,

Our software has for a long time made use of the HDF5 library without any 
issues. Recently we have started to run into datasets far larger than wh at 
was previously used and some scalability issues appear to be showing.


The HDF5 file in question contains a single group with many datasets - A 
specific piece of code opens every dataset one at a time and reads from it via 
H5DRead.

Previously it was rare to have more than ~90000 datasets here so this was 
never noticed - but after H5DRead has been called about ~60000 times 
subsequent calls appear to start to become increasingly slow, by about ~80000 
calls it slows to a crawl (instead of processing 1000s a second it is 
processing only two or three per second)

I have tried upgrading from 1.8.8 -> 1.8.9 and this seems to have helped 
slightly, it now becomes unbearable at around ~100000 instead of ~80000 calls.


Some observations:
1) This does not appear to be due to a seek delay or (larger datasets in the 
middle) or anything like that, I have tried e.g. starting at the back of a 
group of ~500000 datasets instead of the front and the same thing happens. I 
have tried also to start in various spots towards the middle and also the same 
behaviour can be observed.
2) If I cancel the loop, allow the software to idle for a while and then give 
it another go the same thing happens (it is fast again until a certain 
quantity of reads) - so it appears that HDF5 may be doing something in the 
background once it is not busy that allows reads to be fast again?


I would greatly appreciate any thoughts on this or ideas as to what might be 
going on?

Regards,
Malcolm MacLeod

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

[Hdf-forum] Scalability/Speed issues using H5DRead

Reply via email to