Hi Eliot, I¹d consider using taskbot (http://registry.demo.marklogic.com/package/taskbot), and using that in combination with either $tb:OPTIONS-SYNC or $tb:OPTIONS-SYNC-UPDATE. It will make optimal use of the TaskServer of the host on which you initiate the call. It doesn¹t scale endlessly, but it batches up the work automatically for you, and will get you a lot further fairly easily..
Cheers, Geert On 5/23/17, 5:43 AM, "[email protected] on behalf of Eliot Kimber" <[email protected] on behalf of [email protected]> wrote: >I haven¹t yet seen anything in the docs that directly address what I¹m >trying to do and suspect I¹m simply missing some ML basics or just going >about things the wrong way. > >I have a corpus of several hundred thousand docs (but could be millions, >of course), where each doc is an average of 200K and several thousand >elements. > >I want to analyze the corpus to get details about the number of specific >subelements within each document, e.g.: > > >for $article in cts:search(/Article, cts:directory-query("/Default/", >"infinity"))[$start to $end] > return <article-counts id=²{$article/@id}² >paras=²{count($article//p}²/> > >I¹m running this as a query from Oxygen (so I can capture the results >locally so I can do other stuff with them). > >On the server I¹m using I blow the expanded tree cache if I try to >request more than about 20,000 docs. > >Is there a way to do this kind of processing over an arbitrarily large >set *and* get the results back from a single query request? > >I think the only solution is to write the results to back to the database >and then fetch that as the last thing but I was hoping there was >something simpler. > >Have I missed an obvious solution? > >Thanks, > >Eliot > >-- >Eliot Kimber >http://contrext.com > > > > >_______________________________________________ >General mailing list >[email protected] >Manage your subscription at: >http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
