Re: [Hdf-forum] optimizing compression of compound type data

John Knutson Tue, 13 Jul 2010 11:13:12 -0700

Francesc Alted wrote:

A Tuesday 13 July 2010 17:06:03 John Knutson escrigué:
Is shuffle meant to work with compound types?  Are there things I need
to be considering in the organization of the axes of the data set in
order to better encourage compression?
Yes, shuffle is designed to work with compound types too. And it works atchunksize level, so depending on the shape of your chunk and how data changeson each dimension of this shape, that *could* have a measurable effect indeed.
Out of curiosity, which is the size of your compound type and your chunk size?

The compound types (there are several) are around 100-200 bytes each.The chunk sizes generally contain between 2K and 4K compound records.This seemed to be the optimal chunk size based on earlier performancetesting, as far as reading and writing performance is concerned anyway.In more detail, the chunk sizes might be something like:

16,2,128 in a data set of dimension 403200,2,128

The above chunk size makes sense given the way data is being writteninto the file, but it might not make as much sense for compression.

Actually, I just thought about this for a bit and realized that I'vebeen (probably needlessly) tying my chunk sizes to the read and writedata spaces. If, as I suspect, they're only loosely intertwined, I cankeep reading and writing the 16,2,128 space, while using a chunking thatis more in tune with compression, e.g. 4096,1,1. I'll have toexperiment with that and see what happens.



_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] optimizing compression of compound type data

Reply via email to