Hi,
I have keywords in my documents. 750K documents. 54K documents have
keywords. 20K are unique (distinct) keywords
I wanted to make a list of the top 200 (most frequent) keywords.
So I'm looping over the distinct keywords and do a
xdmp:estimate(cts:search(//keyword,xdmp:element-value-query(xs:Qname("ke
yword),$keyword)))
Takes to long (over 10 minutes)
So I spawn the task, put it on the Task Sever, and have an
xdmp:node-insert-child(fn:doc("/report.xml")/table,<tr><td>{$keyword}</t
d><td>{$number}</td></tr>) insert a row for each keyword.
In the beginning, it would handle 2 searches a second, meaning 2.7h for
all 20K keywords. Fair enough.
But 12h later, the average has dropped to 2 seaches a minute, with still
2K keywords to do, meaning 16 more hours.
Was the xdmp:node-insert-child a bad idea? Is there a better way to get
the amount of documents containing a specific keyword? Or is there a
better way to incrementally store results of very long queries?
Anyone have any experience on doing statistics of this kind?
Regards
EdgarS
-----Oorspronkelijk bericht-----
Van: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Namens Dominic Mitchell
Verzonden: vrijdag 30 november 2007 10:51
Aan: General Mark Logic Developer Discussion
Onderwerp: Re: [MarkLogic Dev General] Needing to wait for document
update
Neil Bradley wrote:
> In any case, the page redirected to has the following the <head> tag
> to prevent such problems:
>
> <meta http-equiv="Expires" content="-1" />
> <meta http-equiv="Pragma" content="no-cache"/>
> <meta http-equiv="Cache-Control" content="no-cache"/>
I would not tend to rely on these. It's better to explicitly set the
HTTP headers.
-Dom
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general