This is something of a faq: try searching MarkMail.

http://markmail.org/search/list:com.marklogic.developer.general%20distinct-values

The second hit is fairly relevant: http://markmail.org/message/7zg4v67k3n5tp3rm

You might also be interested in cts:frequency():

http://developer.marklogic.com/pubs/3.2/apidocs/SearchBuiltins.html#frequency

-- Mike

Schouten, Edgar J. (RB-NL) wrote:
Hi,

I have keywords in my documents. 750K documents. 54K documents have
keywords. 20K are unique (distinct) keywords
I wanted to make a list of the top 200 (most frequent) keywords.
So I'm looping over the distinct keywords and do a
xdmp:estimate(cts:search(//keyword,xdmp:element-value-query(xs:Qname("ke
yword),$keyword)))
Takes to long (over 10 minutes)
So I spawn the task, put it on the Task Sever, and have an
xdmp:node-insert-child(fn:doc("/report.xml")/table,<tr><td>{$keyword}</t
d><td>{$number}</td></tr>) insert a row for each keyword.

In the beginning, it would handle 2 searches a second, meaning 2.7h for
all 20K keywords. Fair enough.

But 12h later, the average has dropped to 2 seaches a minute, with still
2K keywords to do, meaning 16 more hours.

Was the xdmp:node-insert-child a bad idea? Is there a better way to get
the amount of documents containing a specific keyword? Or is there a
better way to incrementally store results of very long queries?

Anyone have any experience on doing statistics of this kind?

Regards
EdgarS







-----Oorspronkelijk bericht-----
Van: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Namens Dominic Mitchell
Verzonden: vrijdag 30 november 2007 10:51
Aan: General Mark Logic Developer Discussion
Onderwerp: Re: [MarkLogic Dev General] Needing to wait for document
update

Neil Bradley wrote:
In any case, the page redirected to has the following the <head> tag to prevent such problems:

    <meta http-equiv="Expires" content="-1" />
    <meta http-equiv="Pragma" content="no-cache"/>
    <meta http-equiv="Cache-Control" content="no-cache"/>

I would not tend to rely on these.  It's better to explicitly set the
HTTP headers.

-Dom
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to