Francesc Alted wrote:
A Tuesday 13 July 2010 17:06:03 John Knutson escrigué:
Is shuffle meant to work with compound types?  Are there things I need
to be considering in the organization of the axes of the data set in
order to better encourage compression?

Yes, shuffle is designed to work with compound types too. And it works at chunksize level, so depending on the shape of your chunk and how data changes on each dimension of this shape, that *could* have a measurable effect indeed.

Out of curiosity, which is the size of your compound type and your chunk size?

The compound types (there are several) are around 100-200 bytes each. The chunk sizes generally contain between 2K and 4K compound records. This seemed to be the optimal chunk size based on earlier performance testing, as far as reading and writing performance is concerned anyway. In more detail, the chunk sizes might be something like:
16,2,128 in a data set of dimension 403200,2,128

The above chunk size makes sense given the way data is being written into the file, but it might not make as much sense for compression.

Actually, I just thought about this for a bit and realized that I've been (probably needlessly) tying my chunk sizes to the read and write data spaces. If, as I suspect, they're only loosely intertwined, I can keep reading and writing the 16,2,128 space, while using a chunking that is more in tune with compression, e.g. 4096,1,1. I'll have to experiment with that and see what happens.


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to