Francesc Alted wrote:
A Tuesday 13 July 2010 17:06:03 John Knutson escrigué:
Is shuffle meant to work with compound types? Are there things I need
to be considering in the organization of the axes of the data set in
order to better encourage compression?
Yes, shuffle is designed to work with compound types too. And it works at
chunksize level, so depending on the shape of your chunk and how data changes
on each dimension of this shape, that *could* have a measurable effect indeed.
Out of curiosity, which is the size of your compound type and your chunk size?
The compound types (there are several) are around 100-200 bytes each.
The chunk sizes generally contain between 2K and 4K compound records.
This seemed to be the optimal chunk size based on earlier performance
testing, as far as reading and writing performance is concerned anyway.
In more detail, the chunk sizes might be something like:
16,2,128 in a data set of dimension 403200,2,128
The above chunk size makes sense given the way data is being written
into the file, but it might not make as much sense for compression.
Actually, I just thought about this for a bit and realized that I've
been (probably needlessly) tying my chunk sizes to the read and write
data spaces. If, as I suspect, they're only loosely intertwined, I can
keep reading and writing the 16,2,128 space, while using a chunking that
is more in tune with compression, e.g. 4096,1,1. I'll have to
experiment with that and see what happens.
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org