Hi Matt,
On Apr 25, 2012, at 6:38 AM, Matt Calder wrote:
> Quincey,
>
> Thanks for the reply. My current solution involves breaking the data into
> smaller datasets, and accepting the cost of rewriting the smaller sets. In
> effect, I swapped chunks for datasets and did the optimizations at the
> dataset level.
That'll definitely work. :-)
> I would be interested in at least knowing in theory how to patch the source
> code to implement this and would be more than willing to share the outcome of
> any such work. Knowing where to begin would be a big help.
You'll need to look at the code in src/H5Dchunk.c and src/H5Dbtree.c
where the chunks are operated on. However, first, can you write up a short
description of exactly what the feature you are planning to add would do, and
the interface you'd like to have for that functionality and send it to me?
That'll help guide how the feature should be implemented. Feel free to email
me off-list at: [email protected].
Quincey
> Matt
>
> On Wed, Apr 25, 2012 at 7:25 AM, Quincey Koziol <[email protected]> wrote:
> Hi Matt.
>
> On Apr 11, 2012, at 7:23 AM, Matt Calder wrote:
>
> > Hi,
> >
> > I have a set of one dimensional chunked datasets of modest size (larger
> > than available memory). I am looking for the most efficient (mostly in
> > terms of time) way to insert new values into the dataset. A representative
> > example of what I am trying to do is: I have a dataset of 10 billion sorted
> > doubles, and I have a vector (in memory) of 1000 random (sorted) doubles,
> > and I want to insert the values of the vector into the dataset. One way
> > would be to write out a new dataset reading the larger one and merging the
> > vector values as they come up. I could improve this by carefully writing
> > into the existing dataset but it would still involve a lot of data
> > movement. I was hoping that because the dataset is chunked there may be
> > other ways to accomplish the insertion. Thanks for any suggestions.
>
> The HDF5 library doesn't currently perform this sort of insertion on
> chunked datasets, although its technically feasible. If you'd like to work
> on algorithms to add that sort of operation, we'd be happy to work with you
> to guide you through the source code to create a patch that adds the
> capability. Alternatively, if you'd like to fund this activity, that's
> possible with us also.
>
> Quincey
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org