[MarkLogic Dev General] Efficient iterating over values from a large lexicon

Steve Mallen Thu, 02 Jun 2011 03:32:28 -0700

Hi all,

I'm having problems processing a large lexicon of values and wondered if 
anyone had done something similar or had any ideas of how best to deal 
with them.


Basically, I've got a set of several million distinct values, and I want 
to precompute a bunch of statistics for each of them (so that I can then 
facet/sort values on the computed statistic).  So, my plan is to fetch 
all the values from the lexicon (storing them in a temp file, say), and 
then run an XQuery on each value and store the resulting information in 
a document (i.e. one stat document per value).  I cannot do this in a 
single query as it would take far too long to iterate over all values 
and for all the computations and inserts.

But I can't seem to figure out the best way of fetching and iterating 
over a Lexicon in MarkLogic (to pre-fetch the full set of lexicon 
values).  In SQL, I'd use a CURSOR to fetch the values one by one, and 
then close the cursor at the end.  There doesn't seem to be an analogous 
concept in XQuery or XCC.  I've tried something along the following lines:

     (cts:element-values( xs:QName(lexi) ))[$start to $end]

and fetching the values in blocks until I run out of values but I'm 
worried that this isn't very efficient, and I've got this nagging doubt 
that the above will never return the empty sequence when $start is past 
the end of the values.  I'm not even sure how I should get a count of 
the number of distinct values (xdmp:estimate doesn't work on the result 
of cts:element-values()).

So - do you guys know of a way of efficiently iterating over a large set 
of lexicon values without timing out the query on the server?

If I'm missing an obvious solution, please let me know...

-Steve

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Efficient iterating over values from a large lexicon

Reply via email to