On Tue, Nov 27, 2012 at 5:33 PM, Michael Jackson <[email protected]> wrote: > I have a project coming up where we are going to store multiple modalities of > data acquired from Scanning Electron Microscopes (EBSD, EDS and ISE**). > > For each modality the data is acquired in a grid fashion. For the ISE data > the data is actually a gray scale image so those are easy to store and think > about. If the acquired is 2048 x 2048 then you have a gray scale image of > that same size (unsigned char). This is where things start getting > interesting. For the EBSD data there are several signals collected at *each* > pixel. Some of the signals are simple scalar values (floats) and we have been > storing those also in a 2D array just like the ISE image. But one of the > signals is actually itself a 2D image (60x80 pixels). So for example the EBSD > sampling grid dimensions is 100 x 75 and at each grid point there is a 60x80 > array of data. > > The EDS data is much the same except we have a 2048 1D Array at each pixel > and the dimensions of the EDS sampling grid is 512 x 384 > > I am trying to figure out a balance between efficient storage and easy > access. One thought was to store each grid point as its own "group" but that > would be hundreds of thousands of groups and I don't think HDF5 is going to > react well to that. So the other end of that would be to continue to think of > each modality of data as an "Image" and store all the data under a group such > as "EDS" as a large multi-dimensional array. So for example in the EBSD data > acquisition from above I would have a 4D array (100x75x80x60). What type of > attributes should I store the data set so that later when we are reading > through the data we can efficiently grab hyper slabs of the data without > having to read the entire data set into memory? > > I hope all of that was clear enough to elicit some advice on storage. Thanks > for any help. Just for clarification the sizes of the data sets are for our > "experimental" data sets where we are just trying to figure this out. The > real data sets will likely be multi-gigabytes in size for each "slice" of > data where we may have 250 slices. >
I am sure many people with more experience will have better advice (that I also would like to learn), but for our data I decided to store as matrix-like as possible, then use some sort of indexing to access data, PyTables has some indexing functionalists I plan to rely on. It also depends on the application you want to process data (Python in my case is the primary), I think there are also proprietary bitmap indexing schemes too to make data access faster, or you may end up indexing on your own. > > ** EBSD - Electron Backscatter Diffraction > EDS - Energy dispersive Spectra > ISE - Ion Induced Secondary Electron Image > > Thanks for any help or advice. > ___________________________________________________________ > Mike Jackson Principal Software Engineer > BlueQuartz Software Dayton, Ohio > [email protected] www.bluequartz.net > > > _______________________________________________ dashesy _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
