Hi, I've had a similar experience with this writing streams of 2D data, and I've also noticed that performance is much slower if I don't write whole chunks at a time. I would have thought (assuming you've sized the chunk cache suitably) that each 1000x200x1 write would gradually fill up a 1000x200x50 chunk, then some time later that whole chunk would be deflated once when its evicted from the cache and written to disk once. But based on the performance I see I can only guess it's not working like this, so I also just buffer whole chunks myself.
Dan -----Original Message----- From: Hdf-forum [mailto:[email protected]] On Behalf Of Patrick Vacek Sent: 22 March 2016 20:55 To: [email protected] Subject: [Hdf-forum] Deflate and partial chunk writes Hello! I've found an interesting situation that seems like something of a bug to me. I've figured out how to work around it, but I wanted to bring it up in case it comes up for anyone else. I use the Fortran API, and I typically create HDF5 datasets with large, multidimensional chunk sizes, but I only write part of that chunk size at any given time. For example, I'll use a chunk size of 1000 x 200 x 50 but only write 1000 x 200 x 1 elements at a time. This seems to work fine, although on networked filesystems, I sometimes notice that my application is I/O-limited. The solution is to buffer our HDF5 writes locally and then write a full chunk at a time. Recently, I decided to try out the deflate/zlib filter. I've noticed that when I buffer the data locally and write a full chunk at a time, it works beautifully and compresses nicely. But if I do not write a full chunk at a time (say just 1000 x 200 x 1 elements), then my HDF5 file explodes in size. When I examine it with h5stat, I see that the 'raw data' size is about what I'd expect (tens of megabytes), but the 'unaccounted space' size is a few gigabytes. From what I can tell, it looks like the deflate filter is applied to the full chunk, despite that I haven't written the whole thing yet, and as I add more to it, it doesn't overwrite, remove, or re-optimize the parts it has already written. It's as if it deflates a full chunk for each small-ish write. I haven't seen anything in the documentation or the forum to confirm this, but this seems like a problem. If it isn't something easily addressed, I think there should perhaps be a warning about this inefficiency in the documentation for the deflate filter. Thanks! -- Patrick Vacek Engineering Scientist Associate Applied Research Labs, University of Texas _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
