You have to pay attention to the chunk size and access pattern. E.g. a 3-dim dataset with shape [1000,1000,1000] can be accessed in many ways. E.g. successive xy-planes, xz-planes or yz-planes. The chunk shape can be defined such that a certain pattern is favoured. Say you always access successive xz-planes, it makes sense to define a chunk shape [c1,1,c3]. However, if you sometimes also access xy-planes, such a chunk shape is very bad because you need 1000 chunks to get all data of the y-axis and it is the question if you have sufficient memory to keep all those chunks (to service the next xy-plane without rereading all those chunks). So if you have different access patterns, the chunk shape has to be a compromise such that all patterns can be serviced reasonably well.
I attach a C++ test program I wrote some time ago to test various access patterns. I hope it is clear enough. Ger >>> Alexander Tzokev 08/22/13 7:58 AM >>> Hello Ger, thank you for the reply. We noticed that the dataset size is small and fits into memory and performed new experiments with 8GB single dataset. The RAM on the test workstation is 4GB. Unfortunately the results are the same. There is no difference in execution time and IO operations with different values for mdc_nelmst, rdcc_nelemts, rdcc_nbytes and rdcc_w0. If needed I can post the source code for the example program and some test results.
tHDF5.cc
Description: Binary data
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
