Re: [Hdf-forum] HDF5 Circular Database

Kirk Harrison Fri, 16 Apr 2010 09:32:28 -0700

To account for possible gaps of data within the stream I need to have a way
of indexing blocks of data within the (single) dataset that I write the data
to. (I elected to use a fixed contiguous dataset approach as opposed to a
dynamically sized one using Chunks so that I can better manage the diskspace
and circular buffer.)


I am in the process of setting up an (dynamic/chunked) indexing dataset to
access the dataset used to capture the datastream. What I envision is each
record in the index table containing elements such as:
- Start_time
- Stop_time
- Num_Records
- Reference (??? See question 3 below)
Each index record would be use to describe a region in the continuous
dataset used to capture the streamed data (which would further be used by a
client to set up hyperslabs to request specific groups of data.)

I am still in the process of learning about HDF5 Links. I was thinking I
might be able to simply have the index table contain soft links to the
stream dataset with possibly properties (Start_time, Stop_time, Num_Records,
etc...)

With all of this being said:
1) Is there a better way to do this within HDF5 (i.e., some built-in
capability to index in this fashion which I have yet to discover)
2) Can links be even placed in a table like this (point to a specific record
in a dataset)
3) What is recommended mechanism for "referencing" a particular record
within a dataset

Kirk

-----Original Message-----
From: Mark Miller [mailto:[email protected]] 
Sent: Friday, March 26, 2010 3:14 PM
To: Kirk Harrison
Subject: RE: [Hdf-forum] HDF5 Circular Database

If you encounter serious performance issues at the I/O level, I'd be
interested to know and may have some suggestions for improvement if you
do.

Mark

On Fri, 2010-03-26 at 11:02, Kirk Harrison wrote:
> Mark and Quincy,
> 
> Thanks! I will look into Hyperslabs as well. I finally located a reference
> under HDF5 Advanced Topics.
> I have multiple streams of time series data that result from different
types
> of processing from the system. The data differs such that I will probably
> try several approaches with each stream in an attempt to optimize
> performance. In the past I have manually programmed this type of binary
> file-based solution and am eager to see what capability and performance I
> can get out of HDF5 for this type of domain. (I also have an associate
> independently evaluating MySQL for comparison.)
> 
> Kirk
> 
> -----Original Message-----
> From: Mark Miller [mailto:[email protected]] 
> Sent: Thursday, March 25, 2010 5:59 PM
> To: [email protected]
> Cc: HDF Users Discussion List
> Subject: Re: [Hdf-forum] HDF5 Circular Database
> 
> Well, I had envisioned your 'buffer' as being a collection of datasets.
> 
> You could just have a single dataset that is the 'buffer' and then you'd
> have to use hyperslabs or selections to write to just a portion of that
> dataset (as Quincey already mentioned).
> 
> HTH
> 
> Mark
> 
> On Thu, 2010-03-25 at 14:03, [email protected] wrote:
> > Mark,
> > 
> > I am new to HDF5 and still working my way through the Tutorials. It
looks
> > promising thus far, but have been concerned about the Circular Database
> > implementation.
> > The dataset size will be static based upon the time duration for which I
> > want to provide data lookup and the data output rate of the sensors. I
> > suppose what I need to figure out then, based on your approach, is how
to
> > "seek" to the appropriate location (record) within the dataset for
> > continued writing of the data. This is probably where your suggestion of
> > adding an attribute (time of acquisition) comes into play.
> > 
> > Thanks for the reassurance and the tips,
> > Kirk
> > 
> > > You should be able to do that pretty easily with HDF5.
> > >
> > > If you are absolutely certain your datasets will never, ever change in
> > > size, you could create an 'empty' database by going through and
creating
> > > N datasets (H5Dcreate) of desired size (H5Screate_simple) but not
> > > actually writing anything to any of the datasets.
> > >
> > > Then, as time evolves, you pick a particular dataset to open
(H5Dopen),
> > > write to (writing afresh if the dataset has yet to be written to or
> > > overwriting whats already there if it has already been written to --
> > > makes no difference to the application. It just calls H5Dwrite) and
> > > H5Dclose.
> > >
> > > If you think you might want to be able to vary dataset size over time,
> > > use 'chunked' datasets (H5Pset_chunk) instead of the default
> > > (contiguous). If you need to maintain other tidbits of information
about
> > > the datasets such as time of acquisition, sensor # (whatever), and
that
> > > data is 'small' (<16kb), attach attributes (H5Acreate) to your
datasets
> > > and overwrite those attributes as you would datasets (H5Aopen,
H5Awrite,
> > > H5Aclose).
> > >
> > > Mark
> > >
> > >
> > > On Thu, 2010-03-25 at 13:11, [email protected] wrote:
> > >> I am interested in using HDF5 to manage sensor data within a
continuous
> > >> Circular Database/File. I wish to define a database of a fixed size
to
> > >> manage a finite amount of historical data. When the database file is
> > >> full
> > >> (i.e. reach the defined capacity) I would like to begin overwriting
the
> > >> oldest data within the file.) This is for an application for a system
> > >> where I only care about the most recent data over a specific duration
> > >> with
> > >> obvious constraints on the amount of storage available.
> > >>
> > >> Does HDF5 have such capability or is there a recommended
> > >> approach/suggestions anyone has?
> > >>
> > >> Best Regards,
> > >> Kirk Harrison
> > >>
> > >>
> > >>
> > >> _______________________________________________
> > >> Hdf-forum is for HDF software users discussion.
> > >> [email protected]
> > >> http://***mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> > > --
> > > Mark C. Miller, Lawrence Livermore National Laboratory
> > > ================!!LLNL BUSINESS ONLY!!================
> > > [email protected]      urgent: [email protected]
> > > T:8-6 (925)-423-5901     M/W/Th:7-12,2-7 (530)-753-851
> > >
> > >
> > > _______________________________________________
> > > Hdf-forum is for HDF software users discussion.
> > > [email protected]
> > > http://**mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> > >
> > 
-- 
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
[email protected]      urgent: [email protected]
T:8-6 (925)-423-5901     M/W/Th:7-12,2-7 (530)-753-851


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] HDF5 Circular Database

Reply via email to