[MarkLogic Dev General] Re: Unfiltered ok, but what of fragment loading (Jason Hunter)

Paul M Thu, 18 Mar 2010 12:16:39 -0700

So unfiltered lets one go deep in paging...
1,000,001 to 1,000,010
filtered may max out the caches earlier
200,001 to 200,010 max next page out of cache.

Memory and fragmentation are still the main factors affecting total records
[990,000 to 1,000,010] // authors
because if the fragments are small, KB vs MB, more can be loaded...
P.S. expanded cache is the one that will be used, this is the one that is 
filled from disk, correct?

And range indexes can be used to avoid disk access all together (for small bits 
of information)

--- On Thu, 3/18/10, [email protected] 
<[email protected]> wrote:

From: [email protected] 
<[email protected]>
Subject: General Digest, Vol 69, Issue 66
To: [email protected]
Date: Thursday, March 18, 2010, 11:46 AM

Send General mailing list submissions to
    [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
    http://xqzone.com/mailman/listinfo/general
or, via email, send a message with subject or body 'help' to
    [email protected]

You can reach the person managing the list at
    [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of General digest..."

Today's Topics:

   1. RE: Unfiltered ok,    but what of fragment loading (Kelly Stirman)
   2. Re: "Hot Swapping" large data sets. (Jason Hunter)
   3. Re: Unfiltered ok,    but what of fragment loading (Jason Hunter)
   4. MLSQL - JDOM version? (Wyatt VanderStucken)

----------------------------------------------------------------------

Message: 1
Date: Thu, 18 Mar 2010 10:33:34 -0700
From: Kelly Stirman <[email protected]>
Subject: [MarkLogic Dev General] RE: Unfiltered ok,    but what of
    fragment loading
To: "[email protected]"
    <[email protected]>
Message-ID:
    <[email protected]>
Content-Type: text/plain; charset="us-ascii"

If you want to get only the authors and their values, you should take a look at 
cts:element-values() or cts:element-attribute-values(). This will require 
creating a range index on the node where your authors are stored, but it will 
eliminate the need to pull all documents into memory.

You can also use cts:frequency() to determine how frequently the author is 
mentioned across all 300 documents.

Kelly

Message: 2
Date: Thu, 18 Mar 2010 07:07:16 -0700 (PDT)
From: Paul M <[email protected]>
Subject: [MarkLogic Dev General] Unfiltered ok, but what of fragment
    loading
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset="us-ascii"

Say I perform an unfiltered search that resolves to 300 fragments. Now, since 
it was unfiltered, no fragments were needed, for the *search*, to be loaded 
into memory. Only the indexes were used. Now lets say I want the authors from 
all these fragments/docs (fragment=doc since no fragmentation policy). The data 
still needs to be loaded into memory for all 300 docs even if I only a small 
piece? i.e. expanded/compressed caches(not certain?) will need to be filled 
with 300 docs?
i.e. Even if a search can be performed without pagination, this does not save 
one from blowing out the caches when the data is retrieved from the docs? 
Pagination may still be required?

Any information is appreciated...

------------------------------

Message: 2
Date: Thu, 18 Mar 2010 11:07:07 -0700
From: Jason Hunter <[email protected]>
Subject: Re: [MarkLogic Dev General] "Hot Swapping" large data sets.
To: General Mark Logic Developer Discussion
    <[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain; charset="windows-1252"

For a single batch load, I like that, but if you do repeated loads you'll have 
to be creating new roles for every batch to distinguish the new content from 
the old.  It seems mentally cheaper/lighter to me to use collections.  My 2c.

-jh-

On Mar 18, 2010, at 9:47 AM, Danny Sokolsky wrote:

> The URI privilege does not control access to the document, it specifies 
> whether you can create a document in that URI space.
>  
> You can do what Keith suggests by putting a read permission on each document 
> that is associated with a role.  Then, when you are ready, grant that role to 
> a role your users already have.  To do this, you would have to add several 
> permissions during the load.  For example, you might add a read and update 
> permission for a “loader” role, and also add a read permission for a 
> “content-user” role.  Then, after you are satisfied that your content is the 
> way you want it, you can give the “content-user” role to the user of your 
> application.
>  
> -Danny
>  
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Keith L. 
> Breinholt
> Sent: Thursday, March 18, 2010 9:34 AM
> To: General Mark Logic Developer Discussion
> Subject: RE: [MarkLogic Dev General] "Hot Swapping" large data sets.
>  
> Another way to allow you to load and update sets and then only make them 
> visible when you are done is to load the content with a unique URI privilege 
> that is assigned to your loader/enricher program.
>  
> Then when you are done and the content is ready you can add that privilege to 
> the role of any users/applications that need to see it.  That way only 
> completed content is visible and it appears ‘instantaneously’ when the 
> privilege is added to the role.
>  
> Keith L. Breinholt
> [email protected]
>  
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Jason Hunter
> Sent: Thursday, March 18, 2010 12:10 AM
> To: General Mark Logic Developer Discussion
> Subject: Re: [MarkLogic Dev General] "Hot Swapping" large data sets.
>  
> On Mar 17, 2010, at 5:23 AM, Lee, David wrote:
>  
> 
> I need to be updating some largish (1G+) sets of documents fairly atomically.
> That is, I'd like to update all the documents and perform some operations 
> like adding properties etc,
> then all at once make the updates visible.   The update process could take 
> several hours.
> Currently this document set shares the same forest as other document sets.
> Its not possible to split these up because the app needs cross-query across 
> all the document sets.
>  
> Any suggestions on how to accomplish this ?
>  
> What happens if you try loading everything as part of a single XCC call 
> passing the large array of files?
>  
> If you want to follow Wayne's advice on using collections, I suppose you'd 
> want to put each batch of docs in a uniquely named collection.  Then you can 
> run your queries against fn:collection($seq) when $seq is the sequence of 
> collections that have been loaded so far.  Or, perhaps more simply, you can 
> do a cts:not-query() against the cts:collection-query("latest") and thus 
> exclude the most recent batch but allow all other docs that were loaded 
> before.  It keeps the new collection in the dark basically.  Handy, 
> efficient, and if each batch gets its own ID then you can easily exclude any 
> batch.
>  
> Point-in-time would do something similar, and is suitable if you're always 
> doing just one bulk load at a time.  Then you can use the point in time to 
> control the visibility.
>  
> -jh-
>  
> 
> 
> NOTICE: This email message is for the sole use of the intended recipient(s) 
> and may contain confidential and privileged information. Any unauthorized 
> review, use, disclosure or distribution is prohibited. If you are not the 
> intended recipient, please contact the sender by reply email and destroy all 
> copies of the original message.
>  
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://xqzone.marklogic.com/pipermail/general/attachments/20100318/566bd15c/attachment-0001.html

------------------------------

Message: 3
Date: Thu, 18 Mar 2010 11:14:05 -0700
From: Jason Hunter <[email protected]>
Subject: Re: [MarkLogic Dev General] Unfiltered ok,    but what of
    fragment loading
To: General Mark Logic Developer Discussion
    <[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain; charset="us-ascii"

> 
> i.e. Even if a search can be performed without pagination, this does not save 
> one from blowing out the caches when the data is retrieved from the docs? 
> Pagination may still be required?

Others have answered how you can use range indexes to pull the data from 
documents without fetching the documents, but in answer to this specific 
question, the perk of an unfiltered search is you can get jump ahead 
arbitrarily deep -- so you can get the authors of documents 1,000,001 to 
1,000,010 even without range indexes using only 10 fragment reads.  So you 
won't blow out any caches.

-jh-

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://xqzone.marklogic.com/pipermail/general/attachments/20100318/24ac755e/attachment-0001.html

------------------------------

Message: 4
Date: Thu, 18 Mar 2010 14:46:43 -0400
From: Wyatt VanderStucken <[email protected]>
Subject: [MarkLogic Dev General] MLSQL - JDOM version?
To: General Mark Logic Developer Discussion
    <[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Greetings all (particularly -jh-),

I've been experimenting w/ the latest MLSQL, and had a question 
regarding the jdom.jar file which is included w/ the MLSQL 
distribution.  The MANIFEST.MF inside the .jar indicates that it is JDOM 
Implementation-Version: 1.0.1, but I don't see that version listed on 
the JDOM site (http://www.jdom.org/news/index.html) - it looks like it 
was built 9/14/2005...

Where it gets tricky is that I'm trying to add the MLSQL servlet to an 
existing Java webapp where JDOM is already in use 
(Implementation-Version: 1.0beta10)...

When I use the 1.0beta10 version I get the following error:
     java.lang.NoSuchMethodError: 
org.jdom.Element.addContent(Lorg/jdom/Content;)Lorg/jdom/Element;

The version bundled with MLSQL remedies the problem (as does JDOM 
version 1.0), but I'm concerned that deploying a newer version will 
break something.  Initial tests are good, but this is a large 
application with 30+ developers, so I'm not sure of all the code that is 
dependent on JDOM...

Can you say with any degree of certainty that code written against JDOM 
1.0beta10 will be compatible with JDOM version 1.0 or 1.0.1?  If forced 
to, will MLSQL work with JDOM version 1.0?

Thanks in advance,
Wyatt

------------------------------

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

End of General Digest, Vol 69, Issue 66
***************************************

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

[MarkLogic Dev General] Re: Unfiltered ok, but what of fragment loading (Jason Hunter)

Reply via email to