On Sat, Jun 26, 2010 at 2:02 PM, Quincey Koziol <[email protected]> wrote: > Hi Leigh! > > On Jun 26, 2010, at 8:06 AM, Leigh Orf wrote: > >> I am a fan of the scale-offset filter followed by the gzip filter to >> really reduce the size of big 3D datasets of weather model data. I am >> using this compression strategy with HDF5 to do massively parallel >> simulations and writing out one HDF5 file per MPI process. > > Glad they are turning out to be useful to you. Adding a "shuffle" > filter preprocessing step may improve the compression ratio further.
You know, I once tried scaleoffset -> shuffle -> gzip but it didn't make the files smaller, it made them bigger... or maybe I messed something up, I'll try it again. > >> I recently discovered when rendering data spanning multiple files that >> there is a boundary issue as you hop from one dataset to the next. >> There is a slight discontinuity in the uncompressed floating point >> data between values as you go from one file to the next. I would >> imagine this has to do with the internal parameters chosen by the >> filter algorithm which must look for the maximum and minimum values in >> the dataset being operated upon, which will vary from file to file >> (from MPI proc. to MPI proc). > > Hmm, yes, I would expect that... > >> Is there some way to have the scale offset filter use global >> parameters such that the discontinuities vanish? Before I used HDF5 I >> used HDF4 and wrote my own scale/offset filter which used the global >> max and min values (using a collective MPI call to determine this) and >> this worked fine. However I like the transparency of the HDF5 filters >> and would prefer to not write my own. > > It's definitely a good idea, but since each dataset is compressed > independently, there isn't a way to have a global set of min/max values, at > least currently. However, I don't imagine it would be too difficult to add a > new "scale type" to the filter... I'll add an issue to our bugtracker and > Elena can prioritize it with the other work there. If you'd like to submit a > patch or find a little bit of funding for us to perform this work, that'll > speed things up. :-) That would probably be the best approach, and then add a new routine like H5Pset_scaleoffset_maxmin. A collective MPI_MAX / MPI_MIN call would get that and it could be fed to the scaleoffset routine with minimal pain / code changes. If you have pointers as to how to add this kind of functionality, I'd be happy to try submitting a patch. I did look into the code and there are several routines designed to calculate max and/or min for different datatypes. In essence I would be removing functionality from the filter, not adding new functionality! Concerning the funding, I feel your pain, believe me! Leigh > > Quincey > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org > -- Leigh Orf Associate Professor of Atmospheric Science Room 130G Engineering and Technology Department of Geology and Meteorology Central Michigan University Mount Pleasant, MI 48859 (989)774-1923 Amateur radio callsign: KG4ULP _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
