On Tue, 2010-12-14 at 05:40 -0800, Quincey Koziol wrote: > The primary problem is the space allocation that has to happen when > data is compressed. This is particularly a problem when performing > independent I/O, since the other processes aren't involved, but > [eventually] need to know about space that was allocated. Collective > I/O is easier, but still will require changes to HDF5, etc. Are you > wanting to use collective or independent I/O for your dataset writing? >
I've had to deal with the space allocation issue for different reasons using custom compression filters (Peter Lindstrum's FPZIP and HZIP for structured meshes of hexs or tets and variables thereon). I think HDF5 lib could 'solve' the allocation problem using an approach I took. However, you do have to 'get comfortable' with the idea that you might not utilize space in file 100% optimally. Here is how it would work. Define a target compression ratio, R:1, that *must* be achieved for a given dataset. If the dataset is N bytes uncompressed, it will be NO MORE than N/R bytes compressed. Allocate N/R bytes in the file for this dataset. If you succeed in compressing by at least a ratio of R, your golden. If not, fail the write and return an 'unable to compress to target ratio' error. The caller can decide to try again with a different target ratio (which will probably require some collective communication as all procs will need to know the newer size). If you succeed and compress by MORE than ratio of R, you waste some space in the file. So what. Disk is cheap! Sometimes, you can take a small sample of the dataset (say the first M bytes, or some bytes from beginning, middle and end), compress it yourself to get an approximate idea of how 'compressible' it might be and then set R based on that quick approximation. In addition, if HDF5 returned to you information on how 'well' it was doing relative to compression targets (something like 'did better than target ratio by 10%' or 'missed target ratio by 3 %'), you can adjust target ratio as necessary. Mark -- Mark C. Miller, Lawrence Livermore National Laboratory ================!!LLNL BUSINESS ONLY!!================ [email protected] urgent: [email protected] T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511 _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
