It's not so much the cache you need to worry about, it's fragment reads off disk. To fetch data from the ten items starting at item #1,000,000 you really don't want to have to read the previous million fragments off disk. That's a lot of random seeks. That's where unfiltered helps you; it lets you jump ahead and read just the 10 that matter.
The range index on the other hand is how a site like MarkMail.org can give you statistics about your result set (the authors, etc) without having to read the result set off disk. -jh- On Mar 18, 2010, at 12:16 PM, Paul M wrote: > So unfiltered lets one go deep in paging... > 1,000,001 to 1,000,010 > filtered may max out the caches earlier > 200,001 to 200,010 max next page out of cache. > > Memory and fragmentation are still the main factors affecting total records > [990,000 to 1,000,010] // authors > because if the fragments are small, KB vs MB, more can be loaded... > P.S. expanded cache is the one that will be used, this is the one that is > filled from disk, correct? > > And range indexes can be used to avoid disk access all together (for small > bits of information) > > > --- On Thu, 3/18/10, [email protected] > <[email protected]> wrote: > > From: [email protected] > <[email protected]> > Subject: General Digest, Vol 69, Issue 66 > To: [email protected] > Date: Thursday, March 18, 2010, 11:46 AM > > Send General mailing list submissions to > [email protected] > > To subscribe or unsubscribe via the World Wide Web, visit > http://xqzone.com/mailman/listinfo/general > or, via email, send a message with subject or body 'help' to > [email protected] > > You can reach the person managing the list at > [email protected] > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of General digest..." > > > Today's Topics: > > 1. RE: Unfiltered ok, but what of fragment loading (Kelly Stirman) > 2. Re: "Hot Swapping" large data sets. (Jason Hunter) > 3. Re: Unfiltered ok, but what of fragment loading (Jason Hunter) > 4. MLSQL - JDOM version? (Wyatt VanderStucken) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 18 Mar 2010 10:33:34 -0700 > From: Kelly Stirman <[email protected]> > Subject: [MarkLogic Dev General] RE: Unfiltered ok, but what of > fragment loading > To: "[email protected]" > <[email protected]> > Message-ID: > <[email protected]> > Content-Type: text/plain; charset="us-ascii" > > If you want to get only the authors and their values, you should take a look > at cts:element-values() or cts:element-attribute-values(). This will require > creating a range index on the node where your authors are stored, but it will > eliminate the need to pull all documents into memory. > > You can also use cts:frequency() to determine how frequently the author is > mentioned across all 300 documents. > > Kelly > > Message: 2 > Date: Thu, 18 Mar 2010 07:07:16 -0700 (PDT) > From: Paul M <[email protected]> > Subject: [MarkLogic Dev General] Unfiltered ok, but what of fragment > loading > To: [email protected] > Message-ID: <[email protected]> > Content-Type: text/plain; charset="us-ascii" > > Say I perform an unfiltered search that resolves to 300 fragments. Now, since > it was unfiltered, no fragments were needed, for the *search*, to be loaded > into memory. Only the indexes were used. Now lets say I want the authors from > all these fragments/docs (fragment=doc since no fragmentation policy). The > data still needs to be loaded into memory for all 300 docs even if I only a > small piece? i.e. expanded/compressed caches(not certain?) will need to be > filled with 300 docs? > i.e. Even if a search can be performed without pagination, this does not save > one from blowing out the caches when the data is retrieved from the docs? > Pagination may still be required? > > Any information is appreciated... > > > ------------------------------ > > Message: 2 > Date: Thu, 18 Mar 2010 11:07:07 -0700 > From: Jason Hunter <[email protected]> > Subject: Re: [MarkLogic Dev General] "Hot Swapping" large data sets. > To: General Mark Logic Developer Discussion > <[email protected]> > Message-ID: <[email protected]> > Content-Type: text/plain; charset="windows-1252" > > For a single batch load, I like that, but if you do repeated loads you'll > have to be creating new roles for every batch to distinguish the new content > from the old. It seems mentally cheaper/lighter to me to use collections. > My 2c. > > -jh- > > On Mar 18, 2010, at 9:47 AM, Danny Sokolsky wrote: > > > The URI privilege does not control access to the document, it specifies > > whether you can create a document in that URI space. > > > > You can do what Keith suggests by putting a read permission on each > > document that is associated with a role. Then, when you are ready, grant > > that role to a role your users already have. To do this, you would have to > > add several permissions during the load. For example, you might add a read > > and update permission for a “loader” role, and also add a read permission > > for a “content-user” role. Then, after you are satisfied that your content > > is the way you want it, you can give the “content-user” role to the user of > > your application. > > > > -Danny > > > > From: [email protected] > > [mailto:[email protected]] On Behalf Of Keith L. > > Breinholt > > Sent: Thursday, March 18, 2010 9:34 AM > > To: General Mark Logic Developer Discussion > > Subject: RE: [MarkLogic Dev General] "Hot Swapping" large data sets. > > > > Another way to allow you to load and update sets and then only make them > > visible when you are done is to load the content with a unique URI > > privilege that is assigned to your loader/enricher program. > > > > Then when you are done and the content is ready you can add that privilege > > to the role of any users/applications that need to see it. That way only > > completed content is visible and it appears ‘instantaneously’ when the > > privilege is added to the role. > > > > Keith L. Breinholt > > [email protected] > > > > From: [email protected] > > [mailto:[email protected]] On Behalf Of Jason Hunter > > Sent: Thursday, March 18, 2010 12:10 AM > > To: General Mark Logic Developer Discussion > > Subject: Re: [MarkLogic Dev General] "Hot Swapping" large data sets. > > > > On Mar 17, 2010, at 5:23 AM, Lee, David wrote: > > > > > > I need to be updating some largish (1G+) sets of documents fairly > > atomically. > > That is, I'd like to update all the documents and perform some operations > > like adding properties etc, > > then all at once make the updates visible. The update process could take > > several hours. > > Currently this document set shares the same forest as other document sets. > > Its not possible to split these up because the app needs cross-query across > > all the document sets. > > > > Any suggestions on how to accomplish this ? > > > > What happens if you try loading everything as part of a single XCC call > > passing the large array of files? > > > > If you want to follow Wayne's advice on using collections, I suppose you'd > > want to put each batch of docs in a uniquely named collection. Then you > > can run your queries against fn:collection($seq) when $seq is the sequence > > of collections that have been loaded so far. Or, perhaps more simply, you > > can do a cts:not-query() against the cts:collection-query("latest") and > > thus exclude the most recent batch but allow all other docs that were > > loaded before. It keeps the new collection in the dark basically. Handy, > > efficient, and if each batch gets its own ID then you can easily exclude > > any batch. > > > > Point-in-time would do something similar, and is suitable if you're always > > doing just one bulk load at a time. Then you can use the point in time to > > control the visibility. > > > > -jh- > > > > > > > > NOTICE: This email message is for the sole use of the intended recipient(s) > > and may contain confidential and privileged information. Any unauthorized > > review, use, disclosure or distribution is prohibited. If you are not the > > intended recipient, please contact the sender by reply email and destroy > > all copies of the original message. > > > > _______________________________________________ > > General mailing list > > [email protected] > > http://xqzone.com/mailman/listinfo/general > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://xqzone.marklogic.com/pipermail/general/attachments/20100318/566bd15c/attachment-0001.html > > ------------------------------ > > Message: 3 > Date: Thu, 18 Mar 2010 11:14:05 -0700 > From: Jason Hunter <[email protected]> > Subject: Re: [MarkLogic Dev General] Unfiltered ok, but what of > fragment loading > To: General Mark Logic Developer Discussion > <[email protected]> > Message-ID: <[email protected]> > Content-Type: text/plain; charset="us-ascii" > > > > > i.e. Even if a search can be performed without pagination, this does not > > save one from blowing out the caches when the data is retrieved from the > > docs? Pagination may still be required? > > Others have answered how you can use range indexes to pull the data from > documents without fetching the documents, but in answer to this specific > question, the perk of an unfiltered search is you can get jump ahead > arbitrarily deep -- so you can get the authors of documents 1,000,001 to > 1,000,010 even without range indexes using only 10 fragment reads. So you > won't blow out any caches. > > -jh- > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://xqzone.marklogic.com/pipermail/general/attachments/20100318/24ac755e/attachment-0001.html > > ------------------------------ > > Message: 4 > Date: Thu, 18 Mar 2010 14:46:43 -0400 > From: Wyatt VanderStucken <[email protected]> > Subject: [MarkLogic Dev General] MLSQL - JDOM version? > To: General Mark Logic Developer Discussion > <[email protected]> > Message-ID: <[email protected]> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Greetings all (particularly -jh-), > > I've been experimenting w/ the latest MLSQL, and had a question > regarding the jdom.jar file which is included w/ the MLSQL > distribution. The MANIFEST.MF inside the .jar indicates that it is JDOM > Implementation-Version: 1.0.1, but I don't see that version listed on > the JDOM site (http://www.jdom.org/news/index.html) - it looks like it > was built 9/14/2005... > > Where it gets tricky is that I'm trying to add the MLSQL servlet to an > existing Java webapp where JDOM is already in use > (Implementation-Version: 1.0beta10)... > > When I use the 1.0beta10 version I get the following error: > java.lang.NoSuchMethodError: > org.jdom.Element.addContent(Lorg/jdom/Content;)Lorg/jdom/Element; > > The version bundled with MLSQL remedies the problem (as does JDOM > version 1.0), but I'm concerned that deploying a newer version will > break something. Initial tests are good, but this is a large > application with 30+ developers, so I'm not sure of all the code that is > dependent on JDOM... > > Can you say with any degree of certainty that code written against JDOM > 1.0beta10 will be compatible with JDOM version 1.0 or 1.0.1? If forced > to, will MLSQL work with JDOM version 1.0? > > Thanks in advance, > Wyatt > > > > ------------------------------ > > _______________________________________________ > General mailing list > [email protected] > http://xqzone.com/mailman/listinfo/general > > > End of General Digest, Vol 69, Issue 66 > *************************************** > > _______________________________________________ > General mailing list > [email protected] > http://xqzone.com/mailman/listinfo/general
_______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
