Re: [Hdf-forum] Advice on Storing data with efficient access

Quincey Koziol Wed, 28 Nov 2012 11:13:27 -0800

Hi Michael,

On Nov 27, 2012, at 6:33 PM, Michael Jackson <[email protected]> 
wrote:


> I have a project coming up where we are going to store multiple modalities of 
> data acquired from Scanning Electron Microscopes (EBSD, EDS and ISE**).
> 
> For each modality the data is acquired in a grid fashion. For the ISE data 
> the data is actually a gray scale image so those are easy to store and think 
> about. If the acquired is 2048 x 2048 then you have a gray scale image of 
> that same size (unsigned char). This is where things start getting 
> interesting. For the EBSD data there are several signals collected at *each* 
> pixel. Some of the signals are simple scalar values (floats) and we have been 
> storing those also in a 2D array just like the ISE image. But one of the 
> signals is actually itself a 2D image (60x80 pixels). So for example the EBSD 
> sampling grid dimensions is 100 x 75 and at each grid point there is a 60x80 
> array of data.
> 
> The EDS data is much the same except we have a 2048 1D Array at each pixel 
> and the dimensions of the EDS sampling grid is 512 x 384
> 
> I am trying to figure out a balance between efficient storage and easy 
> access. One thought was to store each grid point as its own "group" but that 
> would be hundreds of thousands of groups and I don't think HDF5 is going to 
> react well to that. So the other end of that would be to continue to think of 
> each modality of data as an "Image" and store all the data under a group such 
> as "EDS" as a large multi-dimensional array. So for example in the EBSD data 
> acquisition from above I would have a 4D array (100x75x80x60). What type of 
> attributes should I store the data set so that later when we are reading 
> through the data we can efficiently grab hyper slabs of the data without 
> having to read the entire data set into memory?

        Actually, groups with hundreds of thousands of links should be fine.

        However, I would lean toward keeping the image structure and either 
using an array datatype (80x60, in the case you gave) or a compound datatype 
for the "pixels".  Another useful option is to create a group for each "image" 
and then store a separate dataset for each field in the array.

        Quincey

> I hope all of that was clear enough to elicit some advice on storage. Thanks 
> for any help. Just for clarification the sizes of the data sets are for our 
> "experimental" data sets where we are just trying to figure this out. The 
> real data sets will likely be multi-gigabytes in size for each "slice" of 
> data where we may have 250 slices.
> 
> 
> ** EBSD - Electron Backscatter Diffraction
>   EDS - Energy dispersive Spectra
>   ISE - Ion Induced Secondary Electron Image
> 
> Thanks for any help or advice.
> ___________________________________________________________
> Mike Jackson                    Principal Software Engineer
> BlueQuartz Software                            Dayton, Ohio
> [email protected]              www.bluequartz.net
> 
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] Advice on Storing data with efficient access

Reply via email to