Steve - there is a "limit=nnn" option to those lexicon functions that should be the fastest thing, even if the predicate isn't optimized. Also, the first argument allows you to specify a start position *by value*.
So: $values := cts:element-values( xs:QName(lexi), "", "limit=1000") $last := $values[1000] say ... followed by cts:element-values( xs:QName(lexi), $last, "limit=1000") I guess you'd get some overlap between the first and last values of subsequent iterations, but this shouldn't slow down as you progress through the list -Mike On 6/2/2011 6:32 AM, Steve Mallen wrote: > Hi all, > > I'm having problems processing a large lexicon of values and wondered if > anyone had done something similar or had any ideas of how best to deal > with them. > > Basically, I've got a set of several million distinct values, and I want > to precompute a bunch of statistics for each of them (so that I can then > facet/sort values on the computed statistic). So, my plan is to fetch > all the values from the lexicon (storing them in a temp file, say), and > then run an XQuery on each value and store the resulting information in > a document (i.e. one stat document per value). I cannot do this in a > single query as it would take far too long to iterate over all values > and for all the computations and inserts. > > But I can't seem to figure out the best way of fetching and iterating > over a Lexicon in MarkLogic (to pre-fetch the full set of lexicon > values). In SQL, I'd use a CURSOR to fetch the values one by one, and > then close the cursor at the end. There doesn't seem to be an analogous > concept in XQuery or XCC. I've tried something along the following lines: > > (cts:element-values( xs:QName(lexi) ))[$start to $end] > > and fetching the values in blocks until I run out of values but I'm > worried that this isn't very efficient, and I've got this nagging doubt > that the above will never return the empty sequence when $start is past > the end of the values. I'm not even sure how I should get a count of > the number of distinct values (xdmp:estimate doesn't work on the result > of cts:element-values()). > > So - do you guys know of a way of efficiently iterating over a large set > of lexicon values without timing out the query on the server? > > If I'm missing an obvious solution, please let me know... > > -Steve > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
