[MarkLogic Dev General] Spawning to accumalate results

Schouten, Edgar J. (RB-NL) Fri, 30 Nov 2007 02:38:57 -0800

Hi,

I have keywords in my documents. 750K documents. 54K documents have
keywords. 20K are unique (distinct) keywords
I wanted to make a list of the top 200 (most frequent) keywords.
So I'm looping over the distinct keywords and do a
xdmp:estimate(cts:search(//keyword,xdmp:element-value-query(xs:Qname("ke
yword),$keyword)))
Takes to long (over 10 minutes)
So I spawn the task, put it on the Task Sever, and have an
xdmp:node-insert-child(fn:doc("/report.xml")/table,<tr><td>{$keyword}</t
d><td>{$number}</td></tr>) insert a row for each keyword.

In the beginning, it would handle 2 searches a second, meaning 2.7h for
all 20K keywords. Fair enough.

But 12h later, the average has dropped to 2 seaches a minute, with still
2K keywords to do, meaning 16 more hours.

Was the xdmp:node-insert-child a bad idea? Is there a better way to get
the amount of documents containing a specific keyword? Or is there a
better way to incrementally store results of very long queries?

Anyone have any experience on doing statistics of this kind?

Regards
EdgarS

-----Oorspronkelijk bericht-----
Van: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Namens Dominic Mitchell
Verzonden: vrijdag 30 november 2007 10:51
Aan: General Mark Logic Developer Discussion
Onderwerp: Re: [MarkLogic Dev General] Needing to wait for document
update

Neil Bradley wrote:
> In any case, the page redirected to has the following the <head> tag 
> to prevent such problems:
> 
>     <meta http-equiv="Expires" content="-1" />
>     <meta http-equiv="Pragma" content="no-cache"/>
>     <meta http-equiv="Cache-Control" content="no-cache"/>

I would not tend to rely on these.  It's better to explicitly set the
HTTP headers.

-Dom
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

[MarkLogic Dev General] Spawning to accumalate results

Reply via email to