So each time series is a dataset in the file's root directory, thus you end up with a million datasets in the same root group?

Does performance also slow down when you use the HDF5 tools, such as h5ls / h5dump on such a file?

It might make sense to rearrange your datasets hierarchically such that you have only e.g. 1000 datasets per group, and you create 1000 groups, each covering a range of time series, thus getting a million datasets but only 1000 per group.

If all time series are of the same length, it might also be an option to put them all into the same dataset but leave one dimension open and extend that one as new datasets come in, so each time series is just one chunk of such a 2-dimensional dataset.

Cheers,
           Werner



On 11.06.2014 00:40, Tony Kennedy - Ventana Systems UK wrote:
I've got to store anything from ten to possibly millions of time series. At the moment, I create a simple dataspace and store the time series which all works. But once I get to a few thousand time series, performance drops off so HDF5 is no longer an option for me.

Can anyone suggest anything I can try? The function to write the data to the HDF5 is below, it's pretty simple.

Any suggestions at all are more than welcome.

All the best,

Tony.


int AddTimeSeriesToHDF5File(int file_id, charutf8* pStrVarName, double *pVars, int nVals, BOOL bCompress)
{
    hsize_t            dims[1];
    hid_t            dataspace_id;
    hid_t            dcpl, aid2 ;
    hid_t            dataset_id;
    hid_t            attr2;
    herr_t            status;
    hid_t            plist_id;        //compress
    hsize_t            cdims[1];

    //bCompress = FALSE;
    bCompress = TRUE;

    dims[0] = nVals;
    dataspace_id = H5Screate_simple(1, dims, NULL);

    if ( bCompress )
    {
        plist_id  = H5Pcreate (H5P_DATASET_CREATE);
        cdims[0] = nVals;
        status = H5Pset_chunk (plist_id , 1, cdims);

        status = H5Pset_deflate (plist_id , 9);
    }
    else
    {
        plist_id = H5P_DEFAULT;
    }

    //Compact dataset test start
    //dcpl = H5Pcreate (H5P_DATASET_CREATE);
    //status = H5Pset_layout (dcpl, H5D_COMPACT);
    //Compact dataset test end

    // Open an existing dataset.
dataset_id = H5Dcreate2(file_id, pStrVarName, H5T_NATIVE_DOUBLE, dataspace_id, H5P_DEFAULT, plist_id , H5P_DEFAULT);

    //Attribute set start
    aid2  = H5Screate(H5S_SCALAR);
attr2 = H5Acreate2(dataset_id, "Number of points", H5T_NATIVE_INT, aid2, H5P_DEFAULT, H5P_DEFAULT);
    status = H5Awrite(attr2, H5T_NATIVE_INT, &nVals);
    //Attribute set end

    //Now try writing data
status = H5Dwrite(dataset_id, H5T_NATIVE_DOUBLE, H5S_ALL, H5S_ALL, H5P_DEFAULT, pVars);
    status = H5Sclose(dataspace_id);
    status = H5Dclose(dataset_id);
    status = H5Pclose(plist_id);

    return status;
}


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

--
___________________________________________________________________________
Dr. Werner Benger                Visualization Research
Center for Computation & Technology at Louisiana State University (CCT/LSU)
2019  Digital Media Center, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809                        Fax.: +1 225 578-5362


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to