Hi all,

I need to fill a huge 3D array, chunked in its second dimension. My data
are coming as slices with a fixed index in the third dimension, so the
layout needs to be re-ordered. The chunks are uncompressed. When the data
is read, the access pattern sweeps through it in the second dimension, so
the chunking layout makes sense. The data is stored on an SSD, so random
access should be relatively fast. I cannot manipulate data index order.

In theory, when filling up the array, the data could be continuously
written if it were to be stored in a raw file. However, with HDF this
becomes painfully slow. The only way I found to speed this up somewhat is
to read as much slices I can into memory, and then write together in
batches, but I still experience <2MB/sec write transfers on average.

The file is gradually growing as the slices are added. If this expansion
requires re-ordering the entire data, this could explain the slow write
speed. I was wondering whether pre-allocating the entire file somehow could
help with this, and what is the best way to do it. I could not find any
related API function. I know the entire data size before the data
collection starts.

The only idea I have so far is to fill the array with some dummy value (not
the fill one) by sweeping through the chunking dimension before adding the
slices. This would probably grow the file to its final size rapidly, but I
am not sure that this helps at all, and is definitely ugly.

I am using MATLAB 2007a with the 1.6.5 HDF library it is coming with.

Thank you for you comments in advance.

Regards,

Balint
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to