Hi everyone, 

thanks for the helpful comments.

I did check that 
1. compression is actually on
2. i am using the new 1.8 group format 
(this actually required me t write my first nontrivial cython wrapper since
h5py does not provide access to LIBVER_LATEST)

Following the helpful advice on the chunk index overhead I tried to use
contiguous storage. Unfortunately, again I
ran into an unsupported feature: h5py only supports resizable Datasets when
they're chunked, even when using the "low-level" functions which wrap the
HDF5 C-API:

"h5py._stub.NotImplementedError: Extendible contiguous non-external dataset
(Dataset: Feature is unsupported)"

Since I do need resizing, I guess I am stuck with chunked Datasets for now.
I tried different chunk sizes but that did not make a noticeable difference. 

In conclusion, I see no way to get less than about 15x file size overhead
when using HDF5 with h5py for my data....

cheers, Nils
-- 
View this message in context: 
http://hdf-forum.184993.n3.nabble.com/file-h5-tar-gz-is-20-times-smaller-what-s-wrong-tp2509949p2520498.html
Sent from the hdf-forum mailing list archive at Nabble.com.

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to