I haven’t yet seen anything in the docs that directly address what I’m trying
to do and suspect I’m simply missing some ML basics or just going about things
the wrong way.
I have a corpus of several hundred thousand docs (but could be millions, of
course), where each doc is an average of 200K and several thousand elements.
I want to analyze the corpus to get details about the number of specific
subelements within each document, e.g.:
for $article in cts:search(/Article, cts:directory-query("/Default/",
"infinity"))[$start to $end]
return <article-counts id=”{$article/@id}” paras=”{count($article//p}”/>
I’m running this as a query from Oxygen (so I can capture the results locally
so I can do other stuff with them).
On the server I’m using I blow the expanded tree cache if I try to request more
than about 20,000 docs.
Is there a way to do this kind of processing over an arbitrarily large set
*and* get the results back from a single query request?
I think the only solution is to write the results to back to the database and
then fetch that as the last thing but I was hoping there was something simpler.
Have I missed an obvious solution?
Thanks,
Eliot
--
Eliot Kimber
http://contrext.com
_______________________________________________
General mailing list
[email protected]
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general