Re: [Hdf-forum] HDF5 Circular Database

Quincey Koziol Fri, 16 Apr 2010 11:13:34 -0700

Hi Kirk,

On Apr 16, 2010, at 11:29 AM, Kirk Harrison wrote:


> To account for possible gaps of data within the stream I need to have a way
> of indexing blocks of data within the (single) dataset that I write the data
> to. (I elected to use a fixed contiguous dataset approach as opposed to a
> dynamically sized one using Chunks so that I can better manage the diskspace
> and circular buffer.)
> 
> I am in the process of setting up an (dynamic/chunked) indexing dataset to
> access the dataset used to capture the datastream. What I envision is each
> record in the index table containing elements such as:
> - Start_time
> - Stop_time
> - Num_Records
> - Reference (??? See question 3 below)
> Each index record would be use to describe a region in the continuous
> dataset used to capture the streamed data (which would further be used by a
> client to set up hyperslabs to request specific groups of data.)
> 
> I am still in the process of learning about HDF5 Links. I was thinking I
> might be able to simply have the index table contain soft links to the
> stream dataset with possibly properties (Start_time, Stop_time, Num_Records,
> etc...)
> 
> With all of this being said:
> 1) Is there a better way to do this within HDF5 (i.e., some built-in
> capability to index in this fashion which I have yet to discover)
> 2) Can links be even placed in a table like this (point to a specific record
> in a dataset)
> 3) What is recommended mechanism for "referencing" a particular record
> within a dataset

        I think the answer to all three questions is: you should use a dataset 
region reference for this purpose 
(http://www.hdfgroup.org/HDF5/doc/RM/RM_H5R.html#Reference-Create).

        Quincey

> Kirk
> 
> -----Original Message-----
> From: Mark Miller [mailto:[email protected]] 
> Sent: Friday, March 26, 2010 3:14 PM
> To: Kirk Harrison
> Subject: RE: [Hdf-forum] HDF5 Circular Database
> 
> If you encounter serious performance issues at the I/O level, I'd be
> interested to know and may have some suggestions for improvement if you
> do.
> 
> Mark
> 
> On Fri, 2010-03-26 at 11:02, Kirk Harrison wrote:
>> Mark and Quincy,
>> 
>> Thanks! I will look into Hyperslabs as well. I finally located a reference
>> under HDF5 Advanced Topics.
>> I have multiple streams of time series data that result from different
> types
>> of processing from the system. The data differs such that I will probably
>> try several approaches with each stream in an attempt to optimize
>> performance. In the past I have manually programmed this type of binary
>> file-based solution and am eager to see what capability and performance I
>> can get out of HDF5 for this type of domain. (I also have an associate
>> independently evaluating MySQL for comparison.)
>> 
>> Kirk
>> 
>> -----Original Message-----
>> From: Mark Miller [mailto:[email protected]] 
>> Sent: Thursday, March 25, 2010 5:59 PM
>> To: [email protected]
>> Cc: HDF Users Discussion List
>> Subject: Re: [Hdf-forum] HDF5 Circular Database
>> 
>> Well, I had envisioned your 'buffer' as being a collection of datasets.
>> 
>> You could just have a single dataset that is the 'buffer' and then you'd
>> have to use hyperslabs or selections to write to just a portion of that
>> dataset (as Quincey already mentioned).
>> 
>> HTH
>> 
>> Mark
>> 
>> On Thu, 2010-03-25 at 14:03, [email protected] wrote:
>>> Mark,
>>> 
>>> I am new to HDF5 and still working my way through the Tutorials. It
> looks
>>> promising thus far, but have been concerned about the Circular Database
>>> implementation.
>>> The dataset size will be static based upon the time duration for which I
>>> want to provide data lookup and the data output rate of the sensors. I
>>> suppose what I need to figure out then, based on your approach, is how
> to
>>> "seek" to the appropriate location (record) within the dataset for
>>> continued writing of the data. This is probably where your suggestion of
>>> adding an attribute (time of acquisition) comes into play.
>>> 
>>> Thanks for the reassurance and the tips,
>>> Kirk
>>> 
>>>> You should be able to do that pretty easily with HDF5.
>>>> 
>>>> If you are absolutely certain your datasets will never, ever change in
>>>> size, you could create an 'empty' database by going through and
> creating
>>>> N datasets (H5Dcreate) of desired size (H5Screate_simple) but not
>>>> actually writing anything to any of the datasets.
>>>> 
>>>> Then, as time evolves, you pick a particular dataset to open
> (H5Dopen),
>>>> write to (writing afresh if the dataset has yet to be written to or
>>>> overwriting whats already there if it has already been written to --
>>>> makes no difference to the application. It just calls H5Dwrite) and
>>>> H5Dclose.
>>>> 
>>>> If you think you might want to be able to vary dataset size over time,
>>>> use 'chunked' datasets (H5Pset_chunk) instead of the default
>>>> (contiguous). If you need to maintain other tidbits of information
> about
>>>> the datasets such as time of acquisition, sensor # (whatever), and
> that
>>>> data is 'small' (<16kb), attach attributes (H5Acreate) to your
> datasets
>>>> and overwrite those attributes as you would datasets (H5Aopen,
> H5Awrite,
>>>> H5Aclose).
>>>> 
>>>> Mark
>>>> 
>>>> 
>>>> On Thu, 2010-03-25 at 13:11, [email protected] wrote:
>>>>> I am interested in using HDF5 to manage sensor data within a
> continuous
>>>>> Circular Database/File. I wish to define a database of a fixed size
> to
>>>>> manage a finite amount of historical data. When the database file is
>>>>> full
>>>>> (i.e. reach the defined capacity) I would like to begin overwriting
> the
>>>>> oldest data within the file.) This is for an application for a system
>>>>> where I only care about the most recent data over a specific duration
>>>>> with
>>>>> obvious constraints on the amount of storage available.
>>>>> 
>>>>> Does HDF5 have such capability or is there a recommended
>>>>> approach/suggestions anyone has?
>>>>> 
>>>>> Best Regards,
>>>>> Kirk Harrison
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Hdf-forum is for HDF software users discussion.
>>>>> [email protected]
>>>>> http://***mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>> --
>>>> Mark C. Miller, Lawrence Livermore National Laboratory
>>>> ================!!LLNL BUSINESS ONLY!!================
>>>> [email protected]      urgent: [email protected]
>>>> T:8-6 (925)-423-5901     M/W/Th:7-12,2-7 (530)-753-851
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Hdf-forum is for HDF software users discussion.
>>>> [email protected]
>>>> http://**mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>> 
>>> 
> -- 
> Mark C. Miller, Lawrence Livermore National Laboratory
> ================!!LLNL BUSINESS ONLY!!================
> [email protected]      urgent: [email protected]
> T:8-6 (925)-423-5901     M/W/Th:7-12,2-7 (530)-753-851
> 


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] HDF5 Circular Database

Reply via email to