On Tue, Nov 27, 2012 at 5:33 PM, Michael Jackson
<[email protected]> wrote:
> I have a project coming up where we are going to store multiple modalities of 
> data acquired from Scanning Electron Microscopes (EBSD, EDS and ISE**).
>
> For each modality the data is acquired in a grid fashion. For the ISE data 
> the data is actually a gray scale image so those are easy to store and think 
> about. If the acquired is 2048 x 2048 then you have a gray scale image of 
> that same size (unsigned char). This is where things start getting 
> interesting. For the EBSD data there are several signals collected at *each* 
> pixel. Some of the signals are simple scalar values (floats) and we have been 
> storing those also in a 2D array just like the ISE image. But one of the 
> signals is actually itself a 2D image (60x80 pixels). So for example the EBSD 
> sampling grid dimensions is 100 x 75 and at each grid point there is a 60x80 
> array of data.
>
> The EDS data is much the same except we have a 2048 1D Array at each pixel 
> and the dimensions of the EDS sampling grid is 512 x 384
>
> I am trying to figure out a balance between efficient storage and easy 
> access. One thought was to store each grid point as its own "group" but that 
> would be hundreds of thousands of groups and I don't think HDF5 is going to 
> react well to that. So the other end of that would be to continue to think of 
> each modality of data as an "Image" and store all the data under a group such 
> as "EDS" as a large multi-dimensional array. So for example in the EBSD data 
> acquisition from above I would have a 4D array (100x75x80x60). What type of 
> attributes should I store the data set so that later when we are reading 
> through the data we can efficiently grab hyper slabs of the data without 
> having to read the entire data set into memory?
>
> I hope all of that was clear enough to elicit some advice on storage. Thanks 
> for any help. Just for clarification the sizes of the data sets are for our 
> "experimental" data sets where we are just trying to figure this out. The 
> real data sets will likely be multi-gigabytes in size for each "slice" of 
> data where we may have 250 slices.
>

I am sure many people with more experience will have better advice
(that I also would like to learn), but for our data I decided to store
as matrix-like as possible, then use some sort of indexing to access
data, PyTables has some indexing functionalists I plan to rely on. It
also depends on the application you want to process data (Python in my
case is the primary), I think there are also proprietary bitmap
indexing schemes too to make data access faster, or you may end up
indexing on your own.

>
> ** EBSD - Electron Backscatter Diffraction
>    EDS - Energy dispersive Spectra
>    ISE - Ion Induced Secondary Electron Image
>
> Thanks for any help or advice.
> ___________________________________________________________
> Mike Jackson                    Principal Software Engineer
> BlueQuartz Software                            Dayton, Ohio
> [email protected]              www.bluequartz.net
>
>
> _______________________________________________

dashesy

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to