Quincey, Thanks for the tip. A quick read of "HDF5 Dataset Region References" does look promising.
Would you say the main benefit of Region References is more direct (i.e. efficient) construction of the related Hyperslabs upon a data retrieval? Perhaps versus saving a start/stop element number within the index element and having to build a Hyperslab region from that information alone? Kirk > Hi Kirk, > > On Apr 16, 2010, at 11:29 AM, Kirk Harrison wrote: > >> To account for possible gaps of data within the stream I need to have a >> way >> of indexing blocks of data within the (single) dataset that I write the >> data >> to. (I elected to use a fixed contiguous dataset approach as opposed to >> a >> dynamically sized one using Chunks so that I can better manage the >> diskspace >> and circular buffer.) >> >> I am in the process of setting up an (dynamic/chunked) indexing dataset >> to >> access the dataset used to capture the datastream. What I envision is >> each >> record in the index table containing elements such as: >> - Start_time >> - Stop_time >> - Num_Records >> - Reference (??? See question 3 below) >> Each index record would be use to describe a region in the continuous >> dataset used to capture the streamed data (which would further be used >> by a >> client to set up hyperslabs to request specific groups of data.) >> >> I am still in the process of learning about HDF5 Links. I was thinking I >> might be able to simply have the index table contain soft links to the >> stream dataset with possibly properties (Start_time, Stop_time, >> Num_Records, >> etc...) >> >> With all of this being said: >> 1) Is there a better way to do this within HDF5 (i.e., some built-in >> capability to index in this fashion which I have yet to discover) >> 2) Can links be even placed in a table like this (point to a specific >> record >> in a dataset) >> 3) What is recommended mechanism for "referencing" a particular record >> within a dataset > > I think the answer to all three questions is: you should use a dataset > region reference for this purpose > (http://www.hdfgroup.org/HDF5/doc/RM/RM_H5R.html#Reference-Create). > > Quincey > >> Kirk >> >> -----Original Message----- >> From: Mark Miller [mailto:[email protected]] >> Sent: Friday, March 26, 2010 3:14 PM >> To: Kirk Harrison >> Subject: RE: [Hdf-forum] HDF5 Circular Database >> >> If you encounter serious performance issues at the I/O level, I'd be >> interested to know and may have some suggestions for improvement if you >> do. >> >> Mark >> >> On Fri, 2010-03-26 at 11:02, Kirk Harrison wrote: >>> Mark and Quincy, >>> >>> Thanks! I will look into Hyperslabs as well. I finally located a >>> reference >>> under HDF5 Advanced Topics. >>> I have multiple streams of time series data that result from different >> types >>> of processing from the system. The data differs such that I will >>> probably >>> try several approaches with each stream in an attempt to optimize >>> performance. In the past I have manually programmed this type of binary >>> file-based solution and am eager to see what capability and performance >>> I >>> can get out of HDF5 for this type of domain. (I also have an associate >>> independently evaluating MySQL for comparison.) >>> >>> Kirk >>> >>> -----Original Message----- >>> From: Mark Miller [mailto:[email protected]] >>> Sent: Thursday, March 25, 2010 5:59 PM >>> To: [email protected] >>> Cc: HDF Users Discussion List >>> Subject: Re: [Hdf-forum] HDF5 Circular Database >>> >>> Well, I had envisioned your 'buffer' as being a collection of datasets. >>> >>> You could just have a single dataset that is the 'buffer' and then >>> you'd >>> have to use hyperslabs or selections to write to just a portion of that >>> dataset (as Quincey already mentioned). >>> >>> HTH >>> >>> Mark >>> >>> On Thu, 2010-03-25 at 14:03, [email protected] wrote: >>>> Mark, >>>> >>>> I am new to HDF5 and still working my way through the Tutorials. It >> looks >>>> promising thus far, but have been concerned about the Circular >>>> Database >>>> implementation. >>>> The dataset size will be static based upon the time duration for which >>>> I >>>> want to provide data lookup and the data output rate of the sensors. I >>>> suppose what I need to figure out then, based on your approach, is how >> to >>>> "seek" to the appropriate location (record) within the dataset for >>>> continued writing of the data. This is probably where your suggestion >>>> of >>>> adding an attribute (time of acquisition) comes into play. >>>> >>>> Thanks for the reassurance and the tips, >>>> Kirk >>>> >>>>> You should be able to do that pretty easily with HDF5. >>>>> >>>>> If you are absolutely certain your datasets will never, ever change >>>>> in >>>>> size, you could create an 'empty' database by going through and >> creating >>>>> N datasets (H5Dcreate) of desired size (H5Screate_simple) but not >>>>> actually writing anything to any of the datasets. >>>>> >>>>> Then, as time evolves, you pick a particular dataset to open >> (H5Dopen), >>>>> write to (writing afresh if the dataset has yet to be written to or >>>>> overwriting whats already there if it has already been written to -- >>>>> makes no difference to the application. It just calls H5Dwrite) and >>>>> H5Dclose. >>>>> >>>>> If you think you might want to be able to vary dataset size over >>>>> time, >>>>> use 'chunked' datasets (H5Pset_chunk) instead of the default >>>>> (contiguous). If you need to maintain other tidbits of information >> about >>>>> the datasets such as time of acquisition, sensor # (whatever), and >> that >>>>> data is 'small' (<16kb), attach attributes (H5Acreate) to your >> datasets >>>>> and overwrite those attributes as you would datasets (H5Aopen, >> H5Awrite, >>>>> H5Aclose). >>>>> >>>>> Mark >>>>> >>>>> >>>>> On Thu, 2010-03-25 at 13:11, [email protected] wrote: >>>>>> I am interested in using HDF5 to manage sensor data within a >> continuous >>>>>> Circular Database/File. I wish to define a database of a fixed size >> to >>>>>> manage a finite amount of historical data. When the database file is >>>>>> full >>>>>> (i.e. reach the defined capacity) I would like to begin overwriting >> the >>>>>> oldest data within the file.) This is for an application for a >>>>>> system >>>>>> where I only care about the most recent data over a specific >>>>>> duration >>>>>> with >>>>>> obvious constraints on the amount of storage available. >>>>>> >>>>>> Does HDF5 have such capability or is there a recommended >>>>>> approach/suggestions anyone has? >>>>>> >>>>>> Best Regards, >>>>>> Kirk Harrison >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Hdf-forum is for HDF software users discussion. >>>>>> [email protected] >>>>>> http://***mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org >>>>> -- >>>>> Mark C. Miller, Lawrence Livermore National Laboratory >>>>> ================!!LLNL BUSINESS ONLY!!================ >>>>> [email protected] urgent: [email protected] >>>>> T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851 >>>>> >>>>> >>>>> _______________________________________________ >>>>> Hdf-forum is for HDF software users discussion. >>>>> [email protected] >>>>> http://**mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org >>>>> >>>> >> -- >> Mark C. Miller, Lawrence Livermore National Laboratory >> ================!!LLNL BUSINESS ONLY!!================ >> [email protected] urgent: [email protected] >> T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851 >> > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org > _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
