Re: [MarkLogic Dev General] "Hot Swapping" large data sets.

Jason Hunter Wed, 17 Mar 2010 23:10:14 -0700

On Mar 17, 2010, at 5:23 AM, Lee, David wrote:

> I need to be updating some largish (1G+) sets of documents fairly atomically.
> That is, I'd like to update all the documents and perform some operations 
> like adding properties etc,
> then all at once make the updates visible.   The update process could take 
> several hours.
> Currently this document set shares the same forest as other document sets.
> Its not possible to split these up because the app needs cross-query across 
> all the document sets.
>  
> Any suggestions on how to accomplish this ?


What happens if you try loading everything as part of a single XCC call passing 
the large array of files?

If you want to follow Wayne's advice on using collections, I suppose you'd want 
to put each batch of docs in a uniquely named collection.  Then you can run 
your queries against fn:collection($seq) when $seq is the sequence of 
collections that have been loaded so far.  Or, perhaps more simply, you can do 
a cts:not-query() against the cts:collection-query("latest") and thus exclude 
the most recent batch but allow all other docs that were loaded before.  It 
keeps the new collection in the dark basically.  Handy, efficient, and if each 
batch gets its own ID then you can easily exclude any batch.

Point-in-time would do something similar, and is suitable if you're always 
doing just one bulk load at a time.  Then you can use the point in time to 
control the visibility.

-jh-

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Re: [MarkLogic Dev General] "Hot Swapping" large data sets.

Reply via email to