Re: [Hdf-forum] HDF5 Circular Database

kharrison Fri, 16 Apr 2010 13:01:07 -0700

Quincey,

Thanks for the tip. A quick read of "HDF5 Dataset Region References" does
look promising.


Would you say the main benefit of Region References is more direct (i.e.
efficient) construction of the related Hyperslabs upon a data retrieval?
Perhaps versus saving a start/stop element number within the index element
and having to build a Hyperslab region from that information alone?

Kirk

> Hi Kirk,
>
> On Apr 16, 2010, at 11:29 AM, Kirk Harrison wrote:
>
>> To account for possible gaps of data within the stream I need to have a
>> way
>> of indexing blocks of data within the (single) dataset that I write the
>> data
>> to. (I elected to use a fixed contiguous dataset approach as opposed to
>> a
>> dynamically sized one using Chunks so that I can better manage the
>> diskspace
>> and circular buffer.)
>>
>> I am in the process of setting up an (dynamic/chunked) indexing dataset
>> to
>> access the dataset used to capture the datastream. What I envision is
>> each
>> record in the index table containing elements such as:
>> - Start_time
>> - Stop_time
>> - Num_Records
>> - Reference (??? See question 3 below)
>> Each index record would be use to describe a region in the continuous
>> dataset used to capture the streamed data (which would further be used
>> by a
>> client to set up hyperslabs to request specific groups of data.)
>>
>> I am still in the process of learning about HDF5 Links. I was thinking I
>> might be able to simply have the index table contain soft links to the
>> stream dataset with possibly properties (Start_time, Stop_time,
>> Num_Records,
>> etc...)
>>
>> With all of this being said:
>> 1) Is there a better way to do this within HDF5 (i.e., some built-in
>> capability to index in this fashion which I have yet to discover)
>> 2) Can links be even placed in a table like this (point to a specific
>> record
>> in a dataset)
>> 3) What is recommended mechanism for "referencing" a particular record
>> within a dataset
>
>       I think the answer to all three questions is: you should use a dataset
> region reference for this purpose
> (http://www.hdfgroup.org/HDF5/doc/RM/RM_H5R.html#Reference-Create).
>
>       Quincey
>
>> Kirk
>>
>> -----Original Message-----
>> From: Mark Miller [mailto:[email protected]]
>> Sent: Friday, March 26, 2010 3:14 PM
>> To: Kirk Harrison
>> Subject: RE: [Hdf-forum] HDF5 Circular Database
>>
>> If you encounter serious performance issues at the I/O level, I'd be
>> interested to know and may have some suggestions for improvement if you
>> do.
>>
>> Mark
>>
>> On Fri, 2010-03-26 at 11:02, Kirk Harrison wrote:
>>> Mark and Quincy,
>>>
>>> Thanks! I will look into Hyperslabs as well. I finally located a
>>> reference
>>> under HDF5 Advanced Topics.
>>> I have multiple streams of time series data that result from different
>> types
>>> of processing from the system. The data differs such that I will
>>> probably
>>> try several approaches with each stream in an attempt to optimize
>>> performance. In the past I have manually programmed this type of binary
>>> file-based solution and am eager to see what capability and performance
>>> I
>>> can get out of HDF5 for this type of domain. (I also have an associate
>>> independently evaluating MySQL for comparison.)
>>>
>>> Kirk
>>>
>>> -----Original Message-----
>>> From: Mark Miller [mailto:[email protected]]
>>> Sent: Thursday, March 25, 2010 5:59 PM
>>> To: [email protected]
>>> Cc: HDF Users Discussion List
>>> Subject: Re: [Hdf-forum] HDF5 Circular Database
>>>
>>> Well, I had envisioned your 'buffer' as being a collection of datasets.
>>>
>>> You could just have a single dataset that is the 'buffer' and then
>>> you'd
>>> have to use hyperslabs or selections to write to just a portion of that
>>> dataset (as Quincey already mentioned).
>>>
>>> HTH
>>>
>>> Mark
>>>
>>> On Thu, 2010-03-25 at 14:03, [email protected] wrote:
>>>> Mark,
>>>>
>>>> I am new to HDF5 and still working my way through the Tutorials. It
>> looks
>>>> promising thus far, but have been concerned about the Circular
>>>> Database
>>>> implementation.
>>>> The dataset size will be static based upon the time duration for which
>>>> I
>>>> want to provide data lookup and the data output rate of the sensors. I
>>>> suppose what I need to figure out then, based on your approach, is how
>> to
>>>> "seek" to the appropriate location (record) within the dataset for
>>>> continued writing of the data. This is probably where your suggestion
>>>> of
>>>> adding an attribute (time of acquisition) comes into play.
>>>>
>>>> Thanks for the reassurance and the tips,
>>>> Kirk
>>>>
>>>>> You should be able to do that pretty easily with HDF5.
>>>>>
>>>>> If you are absolutely certain your datasets will never, ever change
>>>>> in
>>>>> size, you could create an 'empty' database by going through and
>> creating
>>>>> N datasets (H5Dcreate) of desired size (H5Screate_simple) but not
>>>>> actually writing anything to any of the datasets.
>>>>>
>>>>> Then, as time evolves, you pick a particular dataset to open
>> (H5Dopen),
>>>>> write to (writing afresh if the dataset has yet to be written to or
>>>>> overwriting whats already there if it has already been written to --
>>>>> makes no difference to the application. It just calls H5Dwrite) and
>>>>> H5Dclose.
>>>>>
>>>>> If you think you might want to be able to vary dataset size over
>>>>> time,
>>>>> use 'chunked' datasets (H5Pset_chunk) instead of the default
>>>>> (contiguous). If you need to maintain other tidbits of information
>> about
>>>>> the datasets such as time of acquisition, sensor # (whatever), and
>> that
>>>>> data is 'small' (<16kb), attach attributes (H5Acreate) to your
>> datasets
>>>>> and overwrite those attributes as you would datasets (H5Aopen,
>> H5Awrite,
>>>>> H5Aclose).
>>>>>
>>>>> Mark
>>>>>
>>>>>
>>>>> On Thu, 2010-03-25 at 13:11, [email protected] wrote:
>>>>>> I am interested in using HDF5 to manage sensor data within a
>> continuous
>>>>>> Circular Database/File. I wish to define a database of a fixed size
>> to
>>>>>> manage a finite amount of historical data. When the database file is
>>>>>> full
>>>>>> (i.e. reach the defined capacity) I would like to begin overwriting
>> the
>>>>>> oldest data within the file.) This is for an application for a
>>>>>> system
>>>>>> where I only care about the most recent data over a specific
>>>>>> duration
>>>>>> with
>>>>>> obvious constraints on the amount of storage available.
>>>>>>
>>>>>> Does HDF5 have such capability or is there a recommended
>>>>>> approach/suggestions anyone has?
>>>>>>
>>>>>> Best Regards,
>>>>>> Kirk Harrison
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Hdf-forum is for HDF software users discussion.
>>>>>> [email protected]
>>>>>> http://***mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>>> --
>>>>> Mark C. Miller, Lawrence Livermore National Laboratory
>>>>> ================!!LLNL BUSINESS ONLY!!================
>>>>> [email protected]      urgent: [email protected]
>>>>> T:8-6 (925)-423-5901     M/W/Th:7-12,2-7 (530)-753-851
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Hdf-forum is for HDF software users discussion.
>>>>> [email protected]
>>>>> http://**mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>>>
>>>>
>> --
>> Mark C. Miller, Lawrence Livermore National Laboratory
>> ================!!LLNL BUSINESS ONLY!!================
>> [email protected]      urgent: [email protected]
>> T:8-6 (925)-423-5901     M/W/Th:7-12,2-7 (530)-753-851
>>
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>



_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] HDF5 Circular Database

Reply via email to