Hello everyone,

I am new to HDF and am trying to understand whether or not it might be a 
suitable file format for my application. The data I'm interested to store is 
usually written by the collecting instrument to basic binary files of 
concatenated packets (think c structures), each of which contains a header with 
a time stamp, packet format, packet identifier, and packet size followed by the 
data itself (arrays) and associated metadata. There are 10's of types of 
packets that may come in any order and they are usually written to the file 
sequentially. Packets contain from 10-100 fields, some of which may be arrays 
of data of various sizes. 

This format allows one to relatively quickly index a file by passing through 
the file and parsing only these headers. Then one can use the index to pull 
subsets of the data in a non-linear fashion, sometimes simultaneously in 
multiple threads for quite fast reading.  The problem is that every instrument 
manufacturer has their own method of encoding packets and a single format is 
needed for archival purposes.

My question to you is how might a similar model be implemented in HDF5 such 
that the same kind of indexing and parallel data retrieval is possible? What is 
to be avoided is the need to read through a file sequentially to get to the 
fields to extract. 

It seems like HDF5 should handle this kind of thing well, but because I am 
inexperienced and because most folks using it seem to be storing relatively 
small numbers of very large arrays (imagery in many cases), rather than 
relatively large numbers of smaller numbers of fields and smaller arrays, it is 
not clear to me how such an implementation might perform. So I guess I'm also 
asking, what is the relative penalty for writing lots of small sets of data? 

I hope this makes sense. 

Thanks in advance,

Val
------------------------------------------------------
Val Schmidt
CCOM/JHC
University of New Hampshire
Chase Ocean Engineering Lab
24 Colovos Road
Durham, NH 03824
e: vschmidt [AT] ccom.unh.edu
m: 614.286.3726



_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to