Re: [MarkLogic Dev General] Efficient iterating over values from a large lexicon

McBeath, Darin W (ELS-STL) Thu, 02 Jun 2011 05:29:42 -0700

I would also suggest you use the task server ... Break the big job into smaller 
jobs (processing say 1000 values in each go) and then spawn these tasks on the 
task server.  I do this fairly often and it has worked well for me.


Darin.


On Jun 2, 2011, at 7:55 AM, "Michael Sokolov" <[email protected]> wrote:

> Steve - there is a "limit=nnn" option to those lexicon functions that 
> should be the fastest thing, even if the predicate isn't optimized.  
> Also, the first argument allows you to specify a start position *by value*.
> 
> So:
> 
> $values := cts:element-values( xs:QName(lexi), "", "limit=1000")
> $last := $values[1000]
> 
> say ...
> 
> followed by
> 
> 
> cts:element-values( xs:QName(lexi), $last, "limit=1000")
> 
> I guess you'd get some overlap between the first and last values of 
> subsequent iterations, but this shouldn't slow down as you progress 
> through the list
> 
> -Mike
> 
> On 6/2/2011 6:32 AM, Steve Mallen wrote:
>> Hi all,
>> 
>> I'm having problems processing a large lexicon of values and wondered if
>> anyone had done something similar or had any ideas of how best to deal
>> with them.
>> 
>> Basically, I've got a set of several million distinct values, and I want
>> to precompute a bunch of statistics for each of them (so that I can then
>> facet/sort values on the computed statistic).  So, my plan is to fetch
>> all the values from the lexicon (storing them in a temp file, say), and
>> then run an XQuery on each value and store the resulting information in
>> a document (i.e. one stat document per value).  I cannot do this in a
>> single query as it would take far too long to iterate over all values
>> and for all the computations and inserts.
>> 
>> But I can't seem to figure out the best way of fetching and iterating
>> over a Lexicon in MarkLogic (to pre-fetch the full set of lexicon
>> values).  In SQL, I'd use a CURSOR to fetch the values one by one, and
>> then close the cursor at the end.  There doesn't seem to be an analogous
>> concept in XQuery or XCC.  I've tried something along the following lines:
>> 
>>      (cts:element-values( xs:QName(lexi) ))[$start to $end]
>> 
>> and fetching the values in blocks until I run out of values but I'm
>> worried that this isn't very efficient, and I've got this nagging doubt
>> that the above will never return the empty sequence when $start is past
>> the end of the values.  I'm not even sure how I should get a count of
>> the number of distinct values (xdmp:estimate doesn't work on the result
>> of cts:element-values()).
>> 
>> So - do you guys know of a way of efficiently iterating over a large set
>> of lexicon values without timing out the query on the server?
>> 
>> If I'm missing an obvious solution, please let me know...
>> 
>> -Steve
>> 
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Efficient iterating over values from a large lexicon

Reply via email to