So unfiltered lets one go deep in paging... 1,000,001 to 1,000,010 filtered may max out the caches earlier 200,001 to 200,010 max next page out of cache.
Memory and fragmentation are still the main factors affecting total records [990,000 to 1,000,010] // authors because if the fragments are small, KB vs MB, more can be loaded... P.S. expanded cache is the one that will be used, this is the one that is filled from disk, correct? And range indexes can be used to avoid disk access all together (for small bits of information) --- On Thu, 3/18/10, [email protected] <[email protected]> wrote: From: [email protected] <[email protected]> Subject: General Digest, Vol 69, Issue 66 To: [email protected] Date: Thursday, March 18, 2010, 11:46 AM Send General mailing list submissions to [email protected] To subscribe or unsubscribe via the World Wide Web, visit http://xqzone.com/mailman/listinfo/general or, via email, send a message with subject or body 'help' to [email protected] You can reach the person managing the list at [email protected] When replying, please edit your Subject line so it is more specific than "Re: Contents of General digest..." Today's Topics: 1. RE: Unfiltered ok, but what of fragment loading (Kelly Stirman) 2. Re: "Hot Swapping" large data sets. (Jason Hunter) 3. Re: Unfiltered ok, but what of fragment loading (Jason Hunter) 4. MLSQL - JDOM version? (Wyatt VanderStucken) ---------------------------------------------------------------------- Message: 1 Date: Thu, 18 Mar 2010 10:33:34 -0700 From: Kelly Stirman <[email protected]> Subject: [MarkLogic Dev General] RE: Unfiltered ok, but what of fragment loading To: "[email protected]" <[email protected]> Message-ID: <[email protected]> Content-Type: text/plain; charset="us-ascii" If you want to get only the authors and their values, you should take a look at cts:element-values() or cts:element-attribute-values(). This will require creating a range index on the node where your authors are stored, but it will eliminate the need to pull all documents into memory. You can also use cts:frequency() to determine how frequently the author is mentioned across all 300 documents. Kelly Message: 2 Date: Thu, 18 Mar 2010 07:07:16 -0700 (PDT) From: Paul M <[email protected]> Subject: [MarkLogic Dev General] Unfiltered ok, but what of fragment loading To: [email protected] Message-ID: <[email protected]> Content-Type: text/plain; charset="us-ascii" Say I perform an unfiltered search that resolves to 300 fragments. Now, since it was unfiltered, no fragments were needed, for the *search*, to be loaded into memory. Only the indexes were used. Now lets say I want the authors from all these fragments/docs (fragment=doc since no fragmentation policy). The data still needs to be loaded into memory for all 300 docs even if I only a small piece? i.e. expanded/compressed caches(not certain?) will need to be filled with 300 docs? i.e. Even if a search can be performed without pagination, this does not save one from blowing out the caches when the data is retrieved from the docs? Pagination may still be required? Any information is appreciated... ------------------------------ Message: 2 Date: Thu, 18 Mar 2010 11:07:07 -0700 From: Jason Hunter <[email protected]> Subject: Re: [MarkLogic Dev General] "Hot Swapping" large data sets. To: General Mark Logic Developer Discussion <[email protected]> Message-ID: <[email protected]> Content-Type: text/plain; charset="windows-1252" For a single batch load, I like that, but if you do repeated loads you'll have to be creating new roles for every batch to distinguish the new content from the old. It seems mentally cheaper/lighter to me to use collections. My 2c. -jh- On Mar 18, 2010, at 9:47 AM, Danny Sokolsky wrote: > The URI privilege does not control access to the document, it specifies > whether you can create a document in that URI space. > > You can do what Keith suggests by putting a read permission on each document > that is associated with a role. Then, when you are ready, grant that role to > a role your users already have. To do this, you would have to add several > permissions during the load. For example, you might add a read and update > permission for a “loader” role, and also add a read permission for a > “content-user” role. Then, after you are satisfied that your content is the > way you want it, you can give the “content-user” role to the user of your > application. > > -Danny > > From: [email protected] > [mailto:[email protected]] On Behalf Of Keith L. > Breinholt > Sent: Thursday, March 18, 2010 9:34 AM > To: General Mark Logic Developer Discussion > Subject: RE: [MarkLogic Dev General] "Hot Swapping" large data sets. > > Another way to allow you to load and update sets and then only make them > visible when you are done is to load the content with a unique URI privilege > that is assigned to your loader/enricher program. > > Then when you are done and the content is ready you can add that privilege to > the role of any users/applications that need to see it. That way only > completed content is visible and it appears ‘instantaneously’ when the > privilege is added to the role. > > Keith L. Breinholt > [email protected] > > From: [email protected] > [mailto:[email protected]] On Behalf Of Jason Hunter > Sent: Thursday, March 18, 2010 12:10 AM > To: General Mark Logic Developer Discussion > Subject: Re: [MarkLogic Dev General] "Hot Swapping" large data sets. > > On Mar 17, 2010, at 5:23 AM, Lee, David wrote: > > > I need to be updating some largish (1G+) sets of documents fairly atomically. > That is, I'd like to update all the documents and perform some operations > like adding properties etc, > then all at once make the updates visible. The update process could take > several hours. > Currently this document set shares the same forest as other document sets. > Its not possible to split these up because the app needs cross-query across > all the document sets. > > Any suggestions on how to accomplish this ? > > What happens if you try loading everything as part of a single XCC call > passing the large array of files? > > If you want to follow Wayne's advice on using collections, I suppose you'd > want to put each batch of docs in a uniquely named collection. Then you can > run your queries against fn:collection($seq) when $seq is the sequence of > collections that have been loaded so far. Or, perhaps more simply, you can > do a cts:not-query() against the cts:collection-query("latest") and thus > exclude the most recent batch but allow all other docs that were loaded > before. It keeps the new collection in the dark basically. Handy, > efficient, and if each batch gets its own ID then you can easily exclude any > batch. > > Point-in-time would do something similar, and is suitable if you're always > doing just one bulk load at a time. Then you can use the point in time to > control the visibility. > > -jh- > > > > NOTICE: This email message is for the sole use of the intended recipient(s) > and may contain confidential and privileged information. Any unauthorized > review, use, disclosure or distribution is prohibited. If you are not the > intended recipient, please contact the sender by reply email and destroy all > copies of the original message. > > _______________________________________________ > General mailing list > [email protected] > http://xqzone.com/mailman/listinfo/general -------------- next part -------------- An HTML attachment was scrubbed... URL: http://xqzone.marklogic.com/pipermail/general/attachments/20100318/566bd15c/attachment-0001.html ------------------------------ Message: 3 Date: Thu, 18 Mar 2010 11:14:05 -0700 From: Jason Hunter <[email protected]> Subject: Re: [MarkLogic Dev General] Unfiltered ok, but what of fragment loading To: General Mark Logic Developer Discussion <[email protected]> Message-ID: <[email protected]> Content-Type: text/plain; charset="us-ascii" > > i.e. Even if a search can be performed without pagination, this does not save > one from blowing out the caches when the data is retrieved from the docs? > Pagination may still be required? Others have answered how you can use range indexes to pull the data from documents without fetching the documents, but in answer to this specific question, the perk of an unfiltered search is you can get jump ahead arbitrarily deep -- so you can get the authors of documents 1,000,001 to 1,000,010 even without range indexes using only 10 fragment reads. So you won't blow out any caches. -jh- -------------- next part -------------- An HTML attachment was scrubbed... URL: http://xqzone.marklogic.com/pipermail/general/attachments/20100318/24ac755e/attachment-0001.html ------------------------------ Message: 4 Date: Thu, 18 Mar 2010 14:46:43 -0400 From: Wyatt VanderStucken <[email protected]> Subject: [MarkLogic Dev General] MLSQL - JDOM version? To: General Mark Logic Developer Discussion <[email protected]> Message-ID: <[email protected]> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Greetings all (particularly -jh-), I've been experimenting w/ the latest MLSQL, and had a question regarding the jdom.jar file which is included w/ the MLSQL distribution. The MANIFEST.MF inside the .jar indicates that it is JDOM Implementation-Version: 1.0.1, but I don't see that version listed on the JDOM site (http://www.jdom.org/news/index.html) - it looks like it was built 9/14/2005... Where it gets tricky is that I'm trying to add the MLSQL servlet to an existing Java webapp where JDOM is already in use (Implementation-Version: 1.0beta10)... When I use the 1.0beta10 version I get the following error: java.lang.NoSuchMethodError: org.jdom.Element.addContent(Lorg/jdom/Content;)Lorg/jdom/Element; The version bundled with MLSQL remedies the problem (as does JDOM version 1.0), but I'm concerned that deploying a newer version will break something. Initial tests are good, but this is a large application with 30+ developers, so I'm not sure of all the code that is dependent on JDOM... Can you say with any degree of certainty that code written against JDOM 1.0beta10 will be compatible with JDOM version 1.0 or 1.0.1? If forced to, will MLSQL work with JDOM version 1.0? Thanks in advance, Wyatt ------------------------------ _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general End of General Digest, Vol 69, Issue 66 ***************************************
_______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
