Quincey,

Thanks for the reply. My current solution involves breaking the data into
smaller datasets, and accepting the cost of rewriting the smaller sets. In
effect, I swapped chunks for datasets and did the optimizations at the
dataset level.

I would be interested in at least knowing in theory how to patch the source
code to implement this and would be more than willing to share the outcome
of any such work. Knowing where to begin would be a big help.

Matt

On Wed, Apr 25, 2012 at 7:25 AM, Quincey Koziol <[email protected]> wrote:

> Hi Matt.
>
> On Apr 11, 2012, at 7:23 AM, Matt Calder wrote:
>
> > Hi,
> >
> > I have a set of one dimensional chunked datasets of modest size (larger
> than available memory). I am looking for the most efficient (mostly in
> terms of time) way to insert new values into the dataset. A representative
> example of what I am trying to do is: I have a dataset of 10 billion sorted
> doubles, and I have a vector (in memory) of 1000 random (sorted) doubles,
> and I want to insert the values of the vector into the dataset. One way
> would be to write out a new dataset reading the larger one and merging the
> vector values as they come up. I could improve this by carefully writing
> into the existing dataset but it would still involve a lot of data
> movement. I was hoping that because the dataset is chunked there may be
> other ways to accomplish the insertion. Thanks for any suggestions.
>
>         The HDF5 library doesn't currently perform this sort of insertion
> on chunked datasets, although its technically feasible.  If you'd like to
> work on algorithms to add that sort of operation, we'd be happy to work
> with you to guide you through the source code to create a patch that adds
> the capability.  Alternatively, if you'd like to fund this activity, that's
> possible with us also.
>
>        Quincey
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to