[Hdf-forum] hdf read/write performance tuning

John Knutson Fri, 16 Apr 2010 15:33:44 -0700

I've been doing some experimenting in an attempt to determine whetherthere is an optimal read/write size and what it is, but I find thatthere are questions I need answers that I haven't found yet... Hopefullysomeone here can provide :-)

Our data is going to be in datasets that consist of anywhere from ~1K to~1M to ~10M compound types of relatively small (~170 bytes) size... Myread tests suggest that the optimal number of records to read from thefile at once is between 2048 and 4096... The data was compressed forthese tests.

What drives that optimal read size? For these tests, only one datasetwas written, so I assume that the data was all contiguous.

Chunking puzzles me. At first, I thought it was a number of bytes (Ihaven't been able to find any documentation that explicitly says whetherit's a number of bytes, a number of records, or what), but now I'm notsure. Again, I did some experiments and found that there was a bit ofextra overhead with a chunk size of 1, but there really wasn't muchdifference between a chunk size of 128, 512, or 2048 (in terms ofwriting speed, mind you, there's definitely a difference in file size).That said, when I tried the same test using a chunk size of 10240 and itslowed down enough that I didn't bother letting it finish. Afterplaying a bit more, it seems the largest chunk size I can pick (inwhatever units it happens to be in), is 6553, with it completing in areasonable time frame (processing time increases by two orders ofmagnitude going from 6553 to 6554).

So what drives optimal chunking size, if your concern is 1) readingquickly, and 2) writing quickly, in that order. Obviously, the filesare a lot smaller with the larger chunk sizes, but why does theprocessing time suddenly skyrocket going from 6553 to 6554? What unitsis the chunk size specified in?


Thanks for any answers!


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

[Hdf-forum] hdf read/write performance tuning

Reply via email to