To account for possible gaps of data within the stream I need to have a way of indexing blocks of data within the (single) dataset that I write the data to. (I elected to use a fixed contiguous dataset approach as opposed to a dynamically sized one using Chunks so that I can better manage the diskspace and circular buffer.)
I am in the process of setting up an (dynamic/chunked) indexing dataset to access the dataset used to capture the datastream. What I envision is each record in the index table containing elements such as: - Start_time - Stop_time - Num_Records - Reference (??? See question 3 below) Each index record would be use to describe a region in the continuous dataset used to capture the streamed data (which would further be used by a client to set up hyperslabs to request specific groups of data.) I am still in the process of learning about HDF5 Links. I was thinking I might be able to simply have the index table contain soft links to the stream dataset with possibly properties (Start_time, Stop_time, Num_Records, etc...) With all of this being said: 1) Is there a better way to do this within HDF5 (i.e., some built-in capability to index in this fashion which I have yet to discover) 2) Can links be even placed in a table like this (point to a specific record in a dataset) 3) What is recommended mechanism for "referencing" a particular record within a dataset Kirk -----Original Message----- From: Mark Miller [mailto:[email protected]] Sent: Friday, March 26, 2010 3:14 PM To: Kirk Harrison Subject: RE: [Hdf-forum] HDF5 Circular Database If you encounter serious performance issues at the I/O level, I'd be interested to know and may have some suggestions for improvement if you do. Mark On Fri, 2010-03-26 at 11:02, Kirk Harrison wrote: > Mark and Quincy, > > Thanks! I will look into Hyperslabs as well. I finally located a reference > under HDF5 Advanced Topics. > I have multiple streams of time series data that result from different types > of processing from the system. The data differs such that I will probably > try several approaches with each stream in an attempt to optimize > performance. In the past I have manually programmed this type of binary > file-based solution and am eager to see what capability and performance I > can get out of HDF5 for this type of domain. (I also have an associate > independently evaluating MySQL for comparison.) > > Kirk > > -----Original Message----- > From: Mark Miller [mailto:[email protected]] > Sent: Thursday, March 25, 2010 5:59 PM > To: [email protected] > Cc: HDF Users Discussion List > Subject: Re: [Hdf-forum] HDF5 Circular Database > > Well, I had envisioned your 'buffer' as being a collection of datasets. > > You could just have a single dataset that is the 'buffer' and then you'd > have to use hyperslabs or selections to write to just a portion of that > dataset (as Quincey already mentioned). > > HTH > > Mark > > On Thu, 2010-03-25 at 14:03, [email protected] wrote: > > Mark, > > > > I am new to HDF5 and still working my way through the Tutorials. It looks > > promising thus far, but have been concerned about the Circular Database > > implementation. > > The dataset size will be static based upon the time duration for which I > > want to provide data lookup and the data output rate of the sensors. I > > suppose what I need to figure out then, based on your approach, is how to > > "seek" to the appropriate location (record) within the dataset for > > continued writing of the data. This is probably where your suggestion of > > adding an attribute (time of acquisition) comes into play. > > > > Thanks for the reassurance and the tips, > > Kirk > > > > > You should be able to do that pretty easily with HDF5. > > > > > > If you are absolutely certain your datasets will never, ever change in > > > size, you could create an 'empty' database by going through and creating > > > N datasets (H5Dcreate) of desired size (H5Screate_simple) but not > > > actually writing anything to any of the datasets. > > > > > > Then, as time evolves, you pick a particular dataset to open (H5Dopen), > > > write to (writing afresh if the dataset has yet to be written to or > > > overwriting whats already there if it has already been written to -- > > > makes no difference to the application. It just calls H5Dwrite) and > > > H5Dclose. > > > > > > If you think you might want to be able to vary dataset size over time, > > > use 'chunked' datasets (H5Pset_chunk) instead of the default > > > (contiguous). If you need to maintain other tidbits of information about > > > the datasets such as time of acquisition, sensor # (whatever), and that > > > data is 'small' (<16kb), attach attributes (H5Acreate) to your datasets > > > and overwrite those attributes as you would datasets (H5Aopen, H5Awrite, > > > H5Aclose). > > > > > > Mark > > > > > > > > > On Thu, 2010-03-25 at 13:11, [email protected] wrote: > > >> I am interested in using HDF5 to manage sensor data within a continuous > > >> Circular Database/File. I wish to define a database of a fixed size to > > >> manage a finite amount of historical data. When the database file is > > >> full > > >> (i.e. reach the defined capacity) I would like to begin overwriting the > > >> oldest data within the file.) This is for an application for a system > > >> where I only care about the most recent data over a specific duration > > >> with > > >> obvious constraints on the amount of storage available. > > >> > > >> Does HDF5 have such capability or is there a recommended > > >> approach/suggestions anyone has? > > >> > > >> Best Regards, > > >> Kirk Harrison > > >> > > >> > > >> > > >> _______________________________________________ > > >> Hdf-forum is for HDF software users discussion. > > >> [email protected] > > >> http://***mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org > > > -- > > > Mark C. Miller, Lawrence Livermore National Laboratory > > > ================!!LLNL BUSINESS ONLY!!================ > > > [email protected] urgent: [email protected] > > > T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851 > > > > > > > > > _______________________________________________ > > > Hdf-forum is for HDF software users discussion. > > > [email protected] > > > http://**mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org > > > > > -- Mark C. Miller, Lawrence Livermore National Laboratory ================!!LLNL BUSINESS ONLY!!================ [email protected] urgent: [email protected] T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851 _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
