Hi Matt,

 I seem to remember that there already is a compression filter in HDF5 that
allows to store an IEEE floating point number as an integer value with an
floating point scale and offset parameter being attached to them, so on reading
you get floating point values, limited to the precision of the underlaying
integers, but also benefiting from integer compression schemes.

For 'special values' it might be best to use an attribute on the dataset
that tells which value, or even value range, means something particular
such a mask. You could even use the HDF5 internal default fill value for
this purpose - this value will be used if you for instance read a chunk
from a chunked dataset that doesn't exist on disk, which would make it
a good case for an 'undefined' region in the data.

http://www.hdfgroup.org/HDF5/doc_resource/H5Fill_Values.html

You would just need to set a fill value that does not occur as valid data.

Shifting data values to circumvent limitations of the visualizations
software sounds like a last-resort solution, but if required, such
should be fairly quick with a compression-like filter that is specific
this software. In the HDF5 file itself you might prefer to have data
values as close to their original values as possible, so doing the
dynamical shift (specific to each viz software that you use)  upon
reading sounds better than during writing. I would expect the
computational cost to be negligible as compared to disk I/O - however
there is an overhead in terms of RAM usage since compression filters
require additional storage memory. This might be an issue for large
data sets, but it should be controllable with bug-free coding (i.e.
having no memory leaks) and sufficiently small chunking of the data sets.

          Werner



On Sat, 15 Jun 2013 15:46:58 -0500, Dougherty, Matthew T <[email protected]> wrote:

I am seeking an opinion as to the computational load involving a simple HDF filter, which will dictate how to encapsulate images with HDF5.

Problem:
Most of the software that generates images in cryo-EM creates a single density modality, 3D lattice. Generally the types of numerical values are IEEE floating point, but on occasion they are unsigned byte; several other numerical representations are permissible. The range of the values are pretty arbitrary, which is the source of the problem. Typically these values range from +/- 10,000. The value of zero has no significant meaning, which is causing a lot of visualization problems for data between -1 and +1, particularly involving division or the fact that some programs attach a special meaning to zero, such as void used in masking/clamping which introduces a new problem of co-mingling density and mask values indistinguishably, forever altering the distinction and the histogram.

Proposed solution:
Shift all density values to positive, and start with the number one as the minimum value. Zero would be reserved to indicate nothingness, such as clamping to exclude density used in mask segmentation. An alternate approach would be to use NaN, but this has several problems including breaking a lot of software.

When encapsulating my 3D image into HDF5 I could perform the simple shift, creating metadata indicating the shift. To do this does not require supporting a shift filter. The alternate approach is to keep the density files as-is during encapsulating, and upon reading the file I dynamically shift the density values using an HDF5 filter.
The image sizes range from 30GB to 4TB.

My inclination is to go with the first method, for reasons of computation, and not having to maintain/distribute the filter. But I am curious as to whether there is any significant computational costs for the second method.


Matthew Dougherty
National Center for Macromolecular Imaging
Baylor College of Medicine
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org


--
___________________________________________________________________________
Dr. Werner Benger                Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809                        Fax.: +1 225 578-5362

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Reply via email to