Re: [MarkLogic Dev General] "Hot Swapping" large data sets.

Nuno Job Wed, 17 Mar 2010 16:24:39 -0700

>
> Hi Lee,


You mentioned you wanted to group things in "sets". A collection is a set,
so as Wayne said you can probably take a lot of advantage of using
collections (they are lightweight and indexed by MarkLogic Server).

I don't know how you are ingesting the data but assuming recordLoader or
xdmp:document-load you can even specify the collection in load time with
options.

*Sample for the use of collections: *

> If you have an application that stores letters you receive by regular mail.

So inserting a phone bill from Mar2010 could be added to these sets
> (collection):


   -

   Mar2010

   -

   phone

   -

   bills


No you would be able to directly find in letters that are bills (using
> fn:collection('bills')), or fastly remove a phone bill from Feb2009.


Collections and directories can be used at the same time, and there's some
really nice things you can do with them.

Hope this helps,
Nuno

On Wed, Mar 17, 2010 at 4:38 PM, Lee, David <[email protected]> wrote:

>  Thanks.  I'm using directories currently ... I wish I could just rename
> them but nope,
>
> but I could make "what directory to use" a variable which I set in a
> document somewhere.
>
> Good idea.
>
> Not sure about collections ... need to look more into  those.
>
>
>
> Point in time queries ... interesting, I thought about those but never
> tried them.
>
>
>
> Thanks for the ideas !
>
>
>
>
>
> *From:* [email protected] [mailto:
> [email protected]] *On Behalf Of *Wayne Feick
> *Sent:* Wednesday, March 17, 2010 3:54 PM
> *To:* General Mark Logic Developer Discussion
> *Subject:* Re: [MarkLogic Dev General] "Hot Swapping" large data sets.
>
>
>
> I'd suggest looking into collections or directories to constrain queries to
> one set or the other such that one is the live set you're serving up and the
> other is the set you're updating.
>
> You might also consider using the Library Services API where the updates
> operate on the most recent version of each document while you serve up
> whatever version was most recent at some fixed point in time before the
> updates began.
>
> The third approach would be to use point in time queries (you'll need to
> set the merge timestamp on the database's merge policy page) such that
> you're serving up content from a fixed commit timestamp before your changes
> while your update process is actively changing the database. We don't
> generally recommend people use point in time queries since there is almost
> always a better way to do what they want, but this particular case is the
> one situation where it makes sense to consider it.
>
> Wayne.
>
>
> On Wed, 2010-03-17 at 05:23 -0700, Lee, David wrote:
>
>  I need to be updating some largish (1G+) sets of documents fairly
> atomically.
>
> That is, I'd like to update all the documents and perform some operations
> like adding properties etc,
>
> then all at once make the updates visible.   The update process could take
> several hours.
>
> Currently this document set shares the same forest as other document sets.
>
> Its not possible to split these up because the app needs cross-query across
> all the document sets.
>
>
>
> Any suggestions on how to accomplish this ?
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ----------------------------------------
>
> David A. Lee
>
> Senior Principal Software Engineer
>
> Epocrates, Inc.
>
> [email protected]
>
> 812-482-5224
>
>
>
>
>
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general
>
>

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Re: [MarkLogic Dev General] "Hot Swapping" large data sets.

Reply via email to