Hi Matt,
I seem to remember that there already is a compression filter in HDF5 that
allows to store an IEEE floating point number as an integer value with an
floating point scale and offset parameter being attached to them, so on
reading
you get floating point values, limited to the precision of the underlaying
integers, but also benefiting from integer compression schemes.
For 'special values' it might be best to use an attribute on the dataset
that tells which value, or even value range, means something particular
such a mask. You could even use the HDF5 internal default fill value for
this purpose - this value will be used if you for instance read a chunk
from a chunked dataset that doesn't exist on disk, which would make it
a good case for an 'undefined' region in the data.
http://www.hdfgroup.org/HDF5/doc_resource/H5Fill_Values.html
You would just need to set a fill value that does not occur as valid data.
Shifting data values to circumvent limitations of the visualizations
software sounds like a last-resort solution, but if required, such
should be fairly quick with a compression-like filter that is specific
this software. In the HDF5 file itself you might prefer to have data
values as close to their original values as possible, so doing the
dynamical shift (specific to each viz software that you use) upon
reading sounds better than during writing. I would expect the
computational cost to be negligible as compared to disk I/O - however
there is an overhead in terms of RAM usage since compression filters
require additional storage memory. This might be an issue for large
data sets, but it should be controllable with bug-free coding (i.e.
having no memory leaks) and sufficiently small chunking of the data sets.
Werner
On Sat, 15 Jun 2013 15:46:58 -0500, Dougherty, Matthew T
<[email protected]> wrote:
I am seeking an opinion as to the computational load involving a simple
HDF filter, which will dictate how to encapsulate images with HDF5.
Problem:
Most of the software that generates images in cryo-EM creates a single
density modality, 3D lattice. Generally the types of numerical values
are IEEE floating point, but on occasion they are unsigned byte; several
other numerical representations are permissible. The range of the
values are pretty arbitrary, which is the source of the problem.
Typically these values range from +/- 10,000. The value of zero has no
significant meaning, which is causing a lot of visualization problems
for data between -1 and +1, particularly involving division or the fact
that some programs attach a special meaning to zero, such as void used
in masking/clamping which introduces a new problem of co-mingling
density and mask values indistinguishably, forever altering the
distinction and the histogram.
Proposed solution:
Shift all density values to positive, and start with the number one as
the minimum value. Zero would be reserved to indicate nothingness, such
as clamping to exclude density used in mask segmentation. An alternate
approach would be to use NaN, but this has several problems including
breaking a lot of software.
When encapsulating my 3D image into HDF5 I could perform the simple
shift, creating metadata indicating the shift. To do this does not
require supporting a shift filter.
The alternate approach is to keep the density files as-is during
encapsulating, and upon reading the file I dynamically shift the density
values using an HDF5 filter.
The image sizes range from 30GB to 4TB.
My inclination is to go with the first method, for reasons of
computation, and not having to maintain/distribute the filter.
But I am curious as to whether there is any significant computational
costs for the second method.
Matthew Dougherty
National Center for Macromolecular Imaging
Baylor College of Medicine
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org