Steve - there is a "limit=nnn" option to those lexicon functions that 
should be the fastest thing, even if the predicate isn't optimized.  
Also, the first argument allows you to specify a start position *by value*.

So:

$values := cts:element-values( xs:QName(lexi), "", "limit=1000")
$last := $values[1000]

say ...

followed by


cts:element-values( xs:QName(lexi), $last, "limit=1000")

I guess you'd get some overlap between the first and last values of 
subsequent iterations, but this shouldn't slow down as you progress 
through the list

-Mike

On 6/2/2011 6:32 AM, Steve Mallen wrote:
> Hi all,
>
> I'm having problems processing a large lexicon of values and wondered if
> anyone had done something similar or had any ideas of how best to deal
> with them.
>
> Basically, I've got a set of several million distinct values, and I want
> to precompute a bunch of statistics for each of them (so that I can then
> facet/sort values on the computed statistic).  So, my plan is to fetch
> all the values from the lexicon (storing them in a temp file, say), and
> then run an XQuery on each value and store the resulting information in
> a document (i.e. one stat document per value).  I cannot do this in a
> single query as it would take far too long to iterate over all values
> and for all the computations and inserts.
>
> But I can't seem to figure out the best way of fetching and iterating
> over a Lexicon in MarkLogic (to pre-fetch the full set of lexicon
> values).  In SQL, I'd use a CURSOR to fetch the values one by one, and
> then close the cursor at the end.  There doesn't seem to be an analogous
> concept in XQuery or XCC.  I've tried something along the following lines:
>
>       (cts:element-values( xs:QName(lexi) ))[$start to $end]
>
> and fetching the values in blocks until I run out of values but I'm
> worried that this isn't very efficient, and I've got this nagging doubt
> that the above will never return the empty sequence when $start is past
> the end of the values.  I'm not even sure how I should get a count of
> the number of distinct values (xdmp:estimate doesn't work on the result
> of cts:element-values()).
>
> So - do you guys know of a way of efficiently iterating over a large set
> of lexicon values without timing out the query on the server?
>
> If I'm missing an obvious solution, please let me know...
>
> -Steve
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to