RE: [MarkLogic Dev General] "Hot Swapping" large data sets.

Danny Sokolsky Wed, 17 Mar 2010 14:52:19 -0700

You might consider creating a CPF process in which the last step either updates 
the document to put it in a different collection (as Wayne suggested) or where 
the last step creates a new document (possibly in a different directory, as you 
are already using directories).  CPF handles a lot of the complexity of 
building a resilient content-processing application.

-Danny

From: [email protected] 
[mailto:[email protected]] On Behalf Of Lee, David
Sent: Wednesday, March 17, 2010 1:38 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] "Hot Swapping" large data sets.

Thanks.  I'm using directories currently ... I wish I could just rename them 
but nope,
but I could make "what directory to use" a variable which I set in a document 
somewhere.
Good idea.
Not sure about collections ... need to look more into  those.

Point in time queries ... interesting, I thought about those but never tried 
them.

Thanks for the ideas !

From: [email protected] 
[mailto:[email protected]] On Behalf Of Wayne Feick
Sent: Wednesday, March 17, 2010 3:54 PM
To: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] "Hot Swapping" large data sets.

I'd suggest looking into collections or directories to constrain queries to one 
set or the other such that one is the live set you're serving up and the other 
is the set you're updating.

You might also consider using the Library Services API where the updates 
operate on the most recent version of each document while you serve up whatever 
version was most recent at some fixed point in time before the updates began.

The third approach would be to use point in time queries (you'll need to set 
the merge timestamp on the database's merge policy page) such that you're 
serving up content from a fixed commit timestamp before your changes while your 
update process is actively changing the database. We don't generally recommend 
people use point in time queries since there is almost always a better way to 
do what they want, but this particular case is the one situation where it makes 
sense to consider it.

Wayne.

On Wed, 2010-03-17 at 05:23 -0700, Lee, David wrote:
I need to be updating some largish (1G+) sets of documents fairly atomically.

That is, I'd like to update all the documents and perform some operations like 
adding properties etc,

then all at once make the updates visible.   The update process could take 
several hours.

Currently this document set shares the same forest as other document sets.

Its not possible to split these up because the app needs cross-query across all 
the document sets.

Any suggestions on how to accomplish this ?

----------------------------------------

David A. Lee

Senior Principal Software Engineer

Epocrates, Inc.

[email protected]<mailto:[email protected]>

812-482-5224

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

RE: [MarkLogic Dev General] "Hot Swapping" large data sets.

Reply via email to