On Sat, Jun 26, 2010 at 2:02 PM, Quincey Koziol <[email protected]> wrote:
> Hi Leigh!
>
> On Jun 26, 2010, at 8:06 AM, Leigh Orf wrote:
>
>> I am a fan of the scale-offset filter followed by the gzip filter to
>> really reduce the size of big 3D datasets of weather model data. I am
>> using this compression strategy with HDF5 to do massively parallel
>> simulations and writing out one HDF5 file per MPI process.
>
>        Glad they are turning out to be useful to you.  Adding a "shuffle" 
> filter preprocessing step may improve the compression ratio further.

You know, I once tried scaleoffset -> shuffle -> gzip but it didn't
make the files smaller, it made them bigger... or maybe I messed
something up, I'll try it again.

>
>> I recently discovered when rendering data spanning multiple files that
>> there is a boundary issue as you hop from one dataset to the next.
>> There is a slight discontinuity in the uncompressed floating point
>> data between values as you go from one file to the next. I would
>> imagine this has to do with the internal parameters chosen by the
>> filter algorithm which must look for the maximum and minimum values in
>> the dataset being operated upon, which will vary from file to file
>> (from MPI proc. to MPI proc).
>
>        Hmm, yes, I would expect that...
>
>> Is there some way to have the scale offset filter use global
>> parameters such that the discontinuities vanish? Before I used HDF5 I
>> used HDF4 and wrote my own scale/offset filter which used the global
>> max and min values (using a collective MPI call to determine this) and
>> this worked fine. However I like the transparency of the HDF5 filters
>> and would prefer to not write my own.
>
>        It's definitely a good idea, but since each dataset is compressed 
> independently, there isn't a way to have a global set of min/max values, at 
> least currently.  However, I don't imagine it would be too difficult to add a 
> new "scale type" to the filter...  I'll add an issue to our bugtracker and 
> Elena can prioritize it with the other work there.  If you'd like to submit a 
> patch or find a little bit of funding for us to perform this work, that'll 
> speed things up. :-)

That would probably be the best approach, and then add a new routine
like H5Pset_scaleoffset_maxmin. A collective MPI_MAX / MPI_MIN call
would get that and it could be fed to the scaleoffset routine with
minimal pain / code changes. If you have pointers as to how to add
this kind of functionality, I'd be happy to try submitting a patch.

I did look into the code and there are several routines designed to
calculate max and/or min for different datatypes. In essence I would
be removing functionality from the filter, not adding new
functionality!

Concerning the funding, I feel your pain, believe me!

Leigh

>
>        Quincey
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>



-- 
Leigh Orf
Associate Professor of Atmospheric Science
Room 130G Engineering and Technology
Department of Geology and Meteorology
Central Michigan University
Mount Pleasant, MI 48859
(989)774-1923
Amateur radio callsign: KG4ULP

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to