Hi all, I need to fill a huge 3D array, chunked in its second dimension. My data are coming as slices with a fixed index in the third dimension, so the layout needs to be re-ordered. The chunks are uncompressed. When the data is read, the access pattern sweeps through it in the second dimension, so the chunking layout makes sense. The data is stored on an SSD, so random access should be relatively fast. I cannot manipulate data index order.
In theory, when filling up the array, the data could be continuously written if it were to be stored in a raw file. However, with HDF this becomes painfully slow. The only way I found to speed this up somewhat is to read as much slices I can into memory, and then write together in batches, but I still experience <2MB/sec write transfers on average. The file is gradually growing as the slices are added. If this expansion requires re-ordering the entire data, this could explain the slow write speed. I was wondering whether pre-allocating the entire file somehow could help with this, and what is the best way to do it. I could not find any related API function. I know the entire data size before the data collection starts. The only idea I have so far is to fill the array with some dummy value (not the fill one) by sweeping through the chunking dimension before adding the slices. This would probably grow the file to its final size rapidly, but I am not sure that this helps at all, and is definitely ugly. I am using MATLAB 2007a with the 1.6.5 HDF library it is coming with. Thank you for you comments in advance. Regards, Balint
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
