Thanks for all the feedback. I'm going to start experimenting with some variations over the next few weeks to see which ones are efficient from an IO standpoint and reasonably "easy" to get at from a coding perspective.
Thanks ___________________________________________________________ Mike Jackson Principal Software Engineer BlueQuartz Software Dayton, Ohio [email protected] www.bluequartz.net On Nov 28, 2012, at 2:12 PM, Quincey Koziol wrote: > Hi Michael, > > On Nov 27, 2012, at 6:33 PM, Michael Jackson <[email protected]> > wrote: > >> I have a project coming up where we are going to store multiple modalities >> of data acquired from Scanning Electron Microscopes (EBSD, EDS and ISE**). >> >> For each modality the data is acquired in a grid fashion. For the ISE data >> the data is actually a gray scale image so those are easy to store and think >> about. If the acquired is 2048 x 2048 then you have a gray scale image of >> that same size (unsigned char). This is where things start getting >> interesting. For the EBSD data there are several signals collected at *each* >> pixel. Some of the signals are simple scalar values (floats) and we have >> been storing those also in a 2D array just like the ISE image. But one of >> the signals is actually itself a 2D image (60x80 pixels). So for example the >> EBSD sampling grid dimensions is 100 x 75 and at each grid point there is a >> 60x80 array of data. >> >> The EDS data is much the same except we have a 2048 1D Array at each pixel >> and the dimensions of the EDS sampling grid is 512 x 384 >> >> I am trying to figure out a balance between efficient storage and easy >> access. One thought was to store each grid point as its own "group" but that >> would be hundreds of thousands of groups and I don't think HDF5 is going to >> react well to that. So the other end of that would be to continue to think >> of each modality of data as an "Image" and store all the data under a group >> such as "EDS" as a large multi-dimensional array. So for example in the EBSD >> data acquisition from above I would have a 4D array (100x75x80x60). What >> type of attributes should I store the data set so that later when we are >> reading through the data we can efficiently grab hyper slabs of the data >> without having to read the entire data set into memory? > > Actually, groups with hundreds of thousands of links should be fine. > > However, I would lean toward keeping the image structure and either > using an array datatype (80x60, in the case you gave) or a compound datatype > for the "pixels". Another useful option is to create a group for each > "image" and then store a separate dataset for each field in the array. > > Quincey > >> I hope all of that was clear enough to elicit some advice on storage. Thanks >> for any help. Just for clarification the sizes of the data sets are for our >> "experimental" data sets where we are just trying to figure this out. The >> real data sets will likely be multi-gigabytes in size for each "slice" of >> data where we may have 250 slices. >> >> >> ** EBSD - Electron Backscatter Diffraction >> EDS - Energy dispersive Spectra >> ISE - Ion Induced Secondary Electron Image >> >> Thanks for any help or advice. >> ___________________________________________________________ >> Mike Jackson Principal Software Engineer >> BlueQuartz Software Dayton, Ohio >> [email protected] www.bluequartz.net >> >> >> _______________________________________________ >> Hdf-forum is for HDF software users discussion. >> [email protected] >> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
