Hello!I've found an interesting situation that seems like something of a bug to me. I've figured out how to work around it, but I wanted to bring it up in case it comes up for anyone else.
I use the Fortran API, and I typically create HDF5 datasets with large, multidimensional chunk sizes, but I only write part of that chunk size at any given time. For example, I'll use a chunk size of 1000 x 200 x 50 but only write 1000 x 200 x 1 elements at a time. This seems to work fine, although on networked filesystems, I sometimes notice that my application is I/O-limited. The solution is to buffer our HDF5 writes locally and then write a full chunk at a time.
Recently, I decided to try out the deflate/zlib filter. I've noticed that when I buffer the data locally and write a full chunk at a time, it works beautifully and compresses nicely. But if I do not write a full chunk at a time (say just 1000 x 200 x 1 elements), then my HDF5 file explodes in size. When I examine it with h5stat, I see that the 'raw data' size is about what I'd expect (tens of megabytes), but the 'unaccounted space' size is a few gigabytes.
From what I can tell, it looks like the deflate filter is applied to the full chunk, despite that I haven't written the whole thing yet, and as I add more to it, it doesn't overwrite, remove, or re-optimize the parts it has already written. It's as if it deflates a full chunk for each small-ish write. I haven't seen anything in the documentation or the forum to confirm this, but this seems like a problem. If it isn't something easily addressed, I think there should perhaps be a warning about this inefficiency in the documentation for the deflate filter.
Thanks! -- Patrick Vacek Engineering Scientist Associate Applied Research Labs, University of Texas
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
