Hi Matt.

On Apr 11, 2012, at 7:23 AM, Matt Calder wrote:

> Hi,
> 
> I have a set of one dimensional chunked datasets of modest size (larger than 
> available memory). I am looking for the most efficient (mostly in terms of 
> time) way to insert new values into the dataset. A representative example of 
> what I am trying to do is: I have a dataset of 10 billion sorted doubles, and 
> I have a vector (in memory) of 1000 random (sorted) doubles, and I want to 
> insert the values of the vector into the dataset. One way would be to write 
> out a new dataset reading the larger one and merging the vector values as 
> they come up. I could improve this by carefully writing into the existing 
> dataset but it would still involve a lot of data movement. I was hoping that 
> because the dataset is chunked there may be other ways to accomplish the 
> insertion. Thanks for any suggestions.

        The HDF5 library doesn't currently perform this sort of insertion on 
chunked datasets, although its technically feasible.  If you'd like to work on 
algorithms to add that sort of operation, we'd be happy to work with you to 
guide you through the source code to create a patch that adds the capability.  
Alternatively, if you'd like to fund this activity, that's possible with us 
also.

        Quincey


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to