I have a project coming up where we are going to store multiple modalities of 
data acquired from Scanning Electron Microscopes (EBSD, EDS and ISE**).

For each modality the data is acquired in a grid fashion. For the ISE data the 
data is actually a gray scale image so those are easy to store and think about. 
If the acquired is 2048 x 2048 then you have a gray scale image of that same 
size (unsigned char). This is where things start getting interesting. For the 
EBSD data there are several signals collected at *each* pixel. Some of the 
signals are simple scalar values (floats) and we have been storing those also 
in a 2D array just like the ISE image. But one of the signals is actually 
itself a 2D image (60x80 pixels). So for example the EBSD sampling grid 
dimensions is 100 x 75 and at each grid point there is a 60x80 array of data.

The EDS data is much the same except we have a 2048 1D Array at each pixel and 
the dimensions of the EDS sampling grid is 512 x 384

I am trying to figure out a balance between efficient storage and easy access. 
One thought was to store each grid point as its own "group" but that would be 
hundreds of thousands of groups and I don't think HDF5 is going to react well 
to that. So the other end of that would be to continue to think of each 
modality of data as an "Image" and store all the data under a group such as 
"EDS" as a large multi-dimensional array. So for example in the EBSD data 
acquisition from above I would have a 4D array (100x75x80x60). What type of 
attributes should I store the data set so that later when we are reading 
through the data we can efficiently grab hyper slabs of the data without having 
to read the entire data set into memory?

I hope all of that was clear enough to elicit some advice on storage. Thanks 
for any help. Just for clarification the sizes of the data sets are for our 
"experimental" data sets where we are just trying to figure this out. The real 
data sets will likely be multi-gigabytes in size for each "slice" of data where 
we may have 250 slices.


** EBSD - Electron Backscatter Diffraction
   EDS - Energy dispersive Spectra
   ISE - Ion Induced Secondary Electron Image

Thanks for any help or advice.
___________________________________________________________
Mike Jackson                    Principal Software Engineer
BlueQuartz Software                            Dayton, Ohio
[email protected]              www.bluequartz.net


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to