Re: [Hdf-forum] hdf suitability for packetized data

Val Schmidt Fri, 11 Feb 2011 10:29:33 -0800

Hm.
This brings up a related question. I like the hierarchical structure of HDF 
files and the file-system like organization it brings. But it begs the question 
- can you do very fast queries using object names? For example can you use 
wildcards in the same way you might within a file system to pull data - ( 
/root/group/packet-* )
-Val


    
On Feb 11, 2011, at 11:54 AM, Mitchell, Scott - IS wrote:

> The first search is pretty straight forward. My link is pretty simple, 
> there's a 1:1 correspondence between the line numbers in the time scale & the 
> dataset.
> 
> The two other searches have to be brute forced from within the Packet Table 
> interface (H5PT) by iterating each line to just pull the individual field(s). 
> There may be a better way from the dataset (H5D). I've stuck with the PT 
> interface because I generally grab the whole dataset and it simplifies the 
> process of adding new data.
> 
> 
> 
> Scott
> 
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]]
>> On Behalf Of Val Schmidt
>> Sent: Friday, February 11, 2011 11:28 AM
>> To: HDF Users Discussion List
>> Subject: Re: [Hdf-forum] hdf suitability for packetized data
>> 
>> Your question is a good one.
>> 
>> I would need to be able to pull a full record (or set of records) within a 
>> set
>> of time bounds.
>> I would need to be able to pull some field from all records for all times - 
>> as
>> a time series.
>> I might need to be able to pull all the fields within some field range for 
>> all
>> times.
>> 
>> I'm thinking of something similar to what you have done (I think) - that is,
>> to self-index the file. The index would be in it's own dataset with an array
>> of time records and perhaps a few other fields and relative links (I forget
>> what HDF5 calls them) to the actual data records.
>> 
>> -Val
>> 
>> On Feb 11, 2011, at 10:56 AM, Mitchell, Scott - IS wrote:
>> 
>>> I'm doing something similar to what you are looking at. I have data coming
>> in from multiple instruments which go through processing and result in one or
>> several C# structures/arrays. In my example each instrument type has a
>> structure containing Packet Tables with associated time axes/scales. The
>> packet table structure mimics the instrument data structures.
>>> 
>>> Metadata is held in Attributes and other Packet Tables. I've created a
>> standard across the program, with specifics defined for each instrument.
>>> 
>>> I end up storing each individual instrument's data in its own file. In most
>> cases, a single thread processes and stores data, so I don't have to worry
>> about synchronization (as much).
>>> 
>>> 
>>> I believe you'll want to store each data type in its own dataset or file.
>> For the ability to search by data type and data length issues. How are you
>> expecting to search?
>>> 
>>> In my case, we allow users to 'play back' the data. I have the time scale as
>> a separate dataset so I can do random access lookups without having to load
>> large data records to find a specific time.
>>> 
>>> 
>>> Scott
>>> 
>>>> -----Original Message-----
>>>> From: [email protected] [mailto:hdf-forum-
>> [email protected]]
>>>> On Behalf Of Val Schmidt
>>>> Sent: Thursday, February 10, 2011 6:10 PM
>>>> To: [email protected]
>>>> Subject: [Hdf-forum] hdf suitability for packetized data
>>>> 
>>>> Hello everyone,
>>>> 
>>>> I am new to HDF and am trying to understand whether or not it might be a
>>>> suitable file format for my application. The data I'm interested to store
>> is
>>>> usually written by the collecting instrument to basic binary files of
>>>> concatenated packets (think c structures), each of which contains a header
>>>> with a time stamp, packet format, packet identifier, and packet size
>> followed
>>>> by the data itself (arrays) and associated metadata. There are 10's of
>> types
>>>> of packets that may come in any order and they are usually written to the
>> file
>>>> sequentially. Packets contain from 10-100 fields, some of which may be
>> arrays
>>>> of data of various sizes.
>>>> 
>>>> This format allows one to relatively quickly index a file by passing
>> through
>>>> the file and parsing only these headers. Then one can use the index to pull
>>>> subsets of the data in a non-linear fashion, sometimes simultaneously in
>>>> multiple threads for quite fast reading.  The problem is that every
>> instrument
>>>> manufacturer has their own method of encoding packets and a single format
>> is
>>>> needed for archival purposes.
>>>> 
>>>> My question to you is how might a similar model be implemented in HDF5 such
>>>> that the same kind of indexing and parallel data retrieval is possible?
>> What
>>>> is to be avoided is the need to read through a file sequentially to get to
>> the
>>>> fields to extract.
>>>> 
>>>> It seems like HDF5 should handle this kind of thing well, but because I am
>>>> inexperienced and because most folks using it seem to be storing relatively
>>>> small numbers of very large arrays (imagery in many cases), rather than
>>>> relatively large numbers of smaller numbers of fields and smaller arrays,
>> it
>>>> is not clear to me how such an implementation might perform. So I guess I'm
>>>> also asking, what is the relative penalty for writing lots of small sets of
>>>> data?
>>>> 
>>>> I hope this makes sense.
>>>> 
>>>> Thanks in advance,
>>>> 
>>>> Val
>>>> ------------------------------------------------------
>>>> Val Schmidt
>>>> CCOM/JHC
>>>> University of New Hampshire
>>>> Chase Ocean Engineering Lab
>>>> 24 Colovos Road
>>>> Durham, NH 03824
>>>> e: vschmidt [AT] ccom.unh.edu
>>>> m: 614.286.3726
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Hdf-forum is for HDF software users discussion.
>>>> [email protected]
>>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>> 
>>> This e-mail and any files transmitted with it may be proprietary and are
>> intended solely for the use of the individual or entity to whom they are
>> addressed. If you have received this e-mail in error please notify the 
>> sender.
>>> Please note that any views or opinions presented in this e-mail are solely
>> those of the author and do not necessarily represent those of ITT 
>> Corporation.
>> The recipient should check this e-mail and any attachments for the presence 
>> of
>> viruses. ITT accepts no liability for any damage caused by any virus
>> transmitted by this e-mail.
>>> 
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> [email protected]
>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>> 
>> ------------------------------------------------------
>> Val Schmidt
>> CCOM/JHC
>> University of New Hampshire
>> Chase Ocean Engineering Lab
>> 24 Colovos Road
>> Durham, NH 03824
>> e: vschmidt [AT] ccom.unh.edu
>> m: 614.286.3726
>> 
>> 
>> 
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

------------------------------------------------------
Val Schmidt
CCOM/JHC
University of New Hampshire
Chase Ocean Engineering Lab
24 Colovos Road
Durham, NH 03824
e: vschmidt [AT] ccom.unh.edu
m: 614.286.3726



_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] hdf suitability for packetized data

Reply via email to