Hi everyone, thanks for the helpful comments.
I did check that 1. compression is actually on 2. i am using the new 1.8 group format (this actually required me t write my first nontrivial cython wrapper since h5py does not provide access to LIBVER_LATEST) Following the helpful advice on the chunk index overhead I tried to use contiguous storage. Unfortunately, again I ran into an unsupported feature: h5py only supports resizable Datasets when they're chunked, even when using the "low-level" functions which wrap the HDF5 C-API: "h5py._stub.NotImplementedError: Extendible contiguous non-external dataset (Dataset: Feature is unsupported)" Since I do need resizing, I guess I am stuck with chunked Datasets for now. I tried different chunk sizes but that did not make a noticeable difference. In conclusion, I see no way to get less than about 15x file size overhead when using HDF5 with h5py for my data.... cheers, Nils -- View this message in context: http://hdf-forum.184993.n3.nabble.com/file-h5-tar-gz-is-20-times-smaller-what-s-wrong-tp2509949p2520498.html Sent from the hdf-forum mailing list archive at Nabble.com. _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
