Hi Leigh,

On Jun 26, 2010, at 10:09 PM, Leigh Orf wrote:

> On Sat, Jun 26, 2010 at 2:02 PM, Quincey Koziol <[email protected]> wrote:
>> Hi Leigh!
>> 
>> On Jun 26, 2010, at 8:06 AM, Leigh Orf wrote:
>> 
>>> I am a fan of the scale-offset filter followed by the gzip filter to
>>> really reduce the size of big 3D datasets of weather model data. I am
>>> using this compression strategy with HDF5 to do massively parallel
>>> simulations and writing out one HDF5 file per MPI process.
>> 
>>        Glad they are turning out to be useful to you.  Adding a "shuffle" 
>> filter preprocessing step may improve the compression ratio further.
> 
> You know, I once tried scaleoffset -> shuffle -> gzip but it didn't
> make the files smaller, it made them bigger... or maybe I messed
> something up, I'll try it again.

        It's best to put the shuffle filter first to rearrange the uncompressed 
bytes.  Putting it later will have no affect on the compression ratio and will 
just chew up cycles.

>>> I recently discovered when rendering data spanning multiple files that
>>> there is a boundary issue as you hop from one dataset to the next.
>>> There is a slight discontinuity in the uncompressed floating point
>>> data between values as you go from one file to the next. I would
>>> imagine this has to do with the internal parameters chosen by the
>>> filter algorithm which must look for the maximum and minimum values in
>>> the dataset being operated upon, which will vary from file to file
>>> (from MPI proc. to MPI proc).
>> 
>>        Hmm, yes, I would expect that...
>> 
>>> Is there some way to have the scale offset filter use global
>>> parameters such that the discontinuities vanish? Before I used HDF5 I
>>> used HDF4 and wrote my own scale/offset filter which used the global
>>> max and min values (using a collective MPI call to determine this) and
>>> this worked fine. However I like the transparency of the HDF5 filters
>>> and would prefer to not write my own.
>> 
>>        It's definitely a good idea, but since each dataset is compressed 
>> independently, there isn't a way to have a global set of min/max values, at 
>> least currently.  However, I don't imagine it would be too difficult to add 
>> a new "scale type" to the filter...  I'll add an issue to our bugtracker and 
>> Elena can prioritize it with the other work there.  If you'd like to submit 
>> a patch or find a little bit of funding for us to perform this work, that'll 
>> speed things up. :-)
> 
> That would probably be the best approach, and then add a new routine
> like H5Pset_scaleoffset_maxmin. A collective MPI_MAX / MPI_MIN call
> would get that and it could be fed to the scaleoffset routine with
> minimal pain / code changes. If you have pointers as to how to add
> this kind of functionality, I'd be happy to try submitting a patch.

        I'm happy to give you some guidance, but I'm on vacation this week, so 
you'll have to ping me again next week.

> I did look into the code and there are several routines designed to
> calculate max and/or min for different datatypes. In essence I would
> be removing functionality from the filter, not adding new
> functionality!

        :-)

        Quincey


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to