Your question is a good one. 

I would need to be able to pull a full record (or set of records) within a set 
of time bounds. 
I would need to be able to pull some field from all records for all times - as 
a time series.
I might need to be able to pull all the fields within some field range for all 
times. 

I'm thinking of something similar to what you have done (I think) - that is, to 
self-index the file. The index would be in it's own dataset with an array of 
time records and perhaps a few other fields and relative links (I forget what 
HDF5 calls them) to the actual data records.  

-Val

On Feb 11, 2011, at 10:56 AM, Mitchell, Scott - IS wrote:

> I'm doing something similar to what you are looking at. I have data coming in 
> from multiple instruments which go through processing and result in one or 
> several C# structures/arrays. In my example each instrument type has a 
> structure containing Packet Tables with associated time axes/scales. The 
> packet table structure mimics the instrument data structures.
> 
> Metadata is held in Attributes and other Packet Tables. I've created a 
> standard across the program, with specifics defined for each instrument.
> 
> I end up storing each individual instrument's data in its own file. In most 
> cases, a single thread processes and stores data, so I don't have to worry 
> about synchronization (as much).
> 
> 
> I believe you'll want to store each data type in its own dataset or file. For 
> the ability to search by data type and data length issues. How are you 
> expecting to search?
> 
> In my case, we allow users to 'play back' the data. I have the time scale as 
> a separate dataset so I can do random access lookups without having to load 
> large data records to find a specific time.
> 
> 
> Scott
> 
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]]
>> On Behalf Of Val Schmidt
>> Sent: Thursday, February 10, 2011 6:10 PM
>> To: [email protected]
>> Subject: [Hdf-forum] hdf suitability for packetized data
>> 
>> Hello everyone,
>> 
>> I am new to HDF and am trying to understand whether or not it might be a
>> suitable file format for my application. The data I'm interested to store is
>> usually written by the collecting instrument to basic binary files of
>> concatenated packets (think c structures), each of which contains a header
>> with a time stamp, packet format, packet identifier, and packet size followed
>> by the data itself (arrays) and associated metadata. There are 10's of types
>> of packets that may come in any order and they are usually written to the 
>> file
>> sequentially. Packets contain from 10-100 fields, some of which may be arrays
>> of data of various sizes.
>> 
>> This format allows one to relatively quickly index a file by passing through
>> the file and parsing only these headers. Then one can use the index to pull
>> subsets of the data in a non-linear fashion, sometimes simultaneously in
>> multiple threads for quite fast reading.  The problem is that every 
>> instrument
>> manufacturer has their own method of encoding packets and a single format is
>> needed for archival purposes.
>> 
>> My question to you is how might a similar model be implemented in HDF5 such
>> that the same kind of indexing and parallel data retrieval is possible? What
>> is to be avoided is the need to read through a file sequentially to get to 
>> the
>> fields to extract.
>> 
>> It seems like HDF5 should handle this kind of thing well, but because I am
>> inexperienced and because most folks using it seem to be storing relatively
>> small numbers of very large arrays (imagery in many cases), rather than
>> relatively large numbers of smaller numbers of fields and smaller arrays, it
>> is not clear to me how such an implementation might perform. So I guess I'm
>> also asking, what is the relative penalty for writing lots of small sets of
>> data?
>> 
>> I hope this makes sense.
>> 
>> Thanks in advance,
>> 
>> Val
>> ------------------------------------------------------
>> Val Schmidt
>> CCOM/JHC
>> University of New Hampshire
>> Chase Ocean Engineering Lab
>> 24 Colovos Road
>> Durham, NH 03824
>> e: vschmidt [AT] ccom.unh.edu
>> m: 614.286.3726
>> 
>> 
>> 
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> 
> This e-mail and any files transmitted with it may be proprietary and are 
> intended solely for the use of the individual or entity to whom they are 
> addressed. If you have received this e-mail in error please notify the sender.
> Please note that any views or opinions presented in this e-mail are solely 
> those of the author and do not necessarily represent those of ITT 
> Corporation. The recipient should check this e-mail and any attachments for 
> the presence of viruses. ITT accepts no liability for any damage caused by 
> any virus transmitted by this e-mail.
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

------------------------------------------------------
Val Schmidt
CCOM/JHC
University of New Hampshire
Chase Ocean Engineering Lab
24 Colovos Road
Durham, NH 03824
e: vschmidt [AT] ccom.unh.edu
m: 614.286.3726



_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to