Your question is a good one. I would need to be able to pull a full record (or set of records) within a set of time bounds. I would need to be able to pull some field from all records for all times - as a time series. I might need to be able to pull all the fields within some field range for all times.
I'm thinking of something similar to what you have done (I think) - that is, to self-index the file. The index would be in it's own dataset with an array of time records and perhaps a few other fields and relative links (I forget what HDF5 calls them) to the actual data records. -Val On Feb 11, 2011, at 10:56 AM, Mitchell, Scott - IS wrote: > I'm doing something similar to what you are looking at. I have data coming in > from multiple instruments which go through processing and result in one or > several C# structures/arrays. In my example each instrument type has a > structure containing Packet Tables with associated time axes/scales. The > packet table structure mimics the instrument data structures. > > Metadata is held in Attributes and other Packet Tables. I've created a > standard across the program, with specifics defined for each instrument. > > I end up storing each individual instrument's data in its own file. In most > cases, a single thread processes and stores data, so I don't have to worry > about synchronization (as much). > > > I believe you'll want to store each data type in its own dataset or file. For > the ability to search by data type and data length issues. How are you > expecting to search? > > In my case, we allow users to 'play back' the data. I have the time scale as > a separate dataset so I can do random access lookups without having to load > large data records to find a specific time. > > > Scott > >> -----Original Message----- >> From: [email protected] [mailto:[email protected]] >> On Behalf Of Val Schmidt >> Sent: Thursday, February 10, 2011 6:10 PM >> To: [email protected] >> Subject: [Hdf-forum] hdf suitability for packetized data >> >> Hello everyone, >> >> I am new to HDF and am trying to understand whether or not it might be a >> suitable file format for my application. The data I'm interested to store is >> usually written by the collecting instrument to basic binary files of >> concatenated packets (think c structures), each of which contains a header >> with a time stamp, packet format, packet identifier, and packet size followed >> by the data itself (arrays) and associated metadata. There are 10's of types >> of packets that may come in any order and they are usually written to the >> file >> sequentially. Packets contain from 10-100 fields, some of which may be arrays >> of data of various sizes. >> >> This format allows one to relatively quickly index a file by passing through >> the file and parsing only these headers. Then one can use the index to pull >> subsets of the data in a non-linear fashion, sometimes simultaneously in >> multiple threads for quite fast reading. The problem is that every >> instrument >> manufacturer has their own method of encoding packets and a single format is >> needed for archival purposes. >> >> My question to you is how might a similar model be implemented in HDF5 such >> that the same kind of indexing and parallel data retrieval is possible? What >> is to be avoided is the need to read through a file sequentially to get to >> the >> fields to extract. >> >> It seems like HDF5 should handle this kind of thing well, but because I am >> inexperienced and because most folks using it seem to be storing relatively >> small numbers of very large arrays (imagery in many cases), rather than >> relatively large numbers of smaller numbers of fields and smaller arrays, it >> is not clear to me how such an implementation might perform. So I guess I'm >> also asking, what is the relative penalty for writing lots of small sets of >> data? >> >> I hope this makes sense. >> >> Thanks in advance, >> >> Val >> ------------------------------------------------------ >> Val Schmidt >> CCOM/JHC >> University of New Hampshire >> Chase Ocean Engineering Lab >> 24 Colovos Road >> Durham, NH 03824 >> e: vschmidt [AT] ccom.unh.edu >> m: 614.286.3726 >> >> >> >> _______________________________________________ >> Hdf-forum is for HDF software users discussion. >> [email protected] >> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org > > This e-mail and any files transmitted with it may be proprietary and are > intended solely for the use of the individual or entity to whom they are > addressed. If you have received this e-mail in error please notify the sender. > Please note that any views or opinions presented in this e-mail are solely > those of the author and do not necessarily represent those of ITT > Corporation. The recipient should check this e-mail and any attachments for > the presence of viruses. ITT accepts no liability for any damage caused by > any virus transmitted by this e-mail. > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org ------------------------------------------------------ Val Schmidt CCOM/JHC University of New Hampshire Chase Ocean Engineering Lab 24 Colovos Road Durham, NH 03824 e: vschmidt [AT] ccom.unh.edu m: 614.286.3726 _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
