The cts:walk can take some time too, simply because the query is so large. My 
test took about 30-sec for a 100-kB XML document. This could be capped using 
xdmp:elapsed-time and cts:action. I also found that it could be reduced to 
about 8-sec by rebuilding the XML in a simpler form:

    element words {
      for $w in cts:tokenize($new-document)[. instance of cts:word]
      return element word { $w } }

Then I remembered the reverse-query feature. With the fast reverse-query index 
enabled, the lookup could be very efficient.

  cts:search(
    xdmp:directory('vocabulary/', 'infinity'),
    cts:reverse-query($new-document))

Without the reverse-query index, this took about 10-sec for my test document. 
That can be cut to about 3-sec by using a simplified version of the document. 
So it was already faster than cts:walk.

  cts:search(
    xdmp:directory('vocabulary/', 'infinity'),
    cts:reverse-query(text { $new-document }))

Enabling the reverse-query index, both versions were sub-second - in fact, less 
than 100-ms, although the text-node version was still 3x faster than the 
marked-up version. Anyway I think reverse-query is the most efficient approach, 
and enabling fast reverse-query searches makes it very fast.

-- Mike

On 24 May 2012, at 10:40 , Will Thompson wrote:

> Matt,
>  
> I thought of this solution before I saw Mike’s post, but this *would* require 
> that the document be inserted first. It leverages the word lexicon, so it 
> should be fairly fast, although it still took a while when I tried something 
> similar using local content.
>  
> (for $w in
> cts:words((),(),
>   cts:and-query((             
>     cts:document-query($user-doc-uri), 
>     cts:word-query((doc(‘terms.xml’)//term/string()))
> order by (cts:frequency($w))
> retrun $w)[1 to 20]
>  
> -Will
>  
> From: [email protected] 
> [mailto:[email protected]] On Behalf 
> [email protected]
> Sent: Thursday, May 24, 2012 9:05 AM
> To: [email protected]
> Subject: [MarkLogic Dev General] Keyword matching strategy
>  
> I have a requirement where the end user would like to add “tags” to 
> individual documents.
>  
> I’m maintaining a separate domain specific list of terms which I suggest to 
> the user as potential tags they can select to apply to the document.
> 
> This list of terms is around 4000 items long. And it will continue to grow.
>  
> What I want to do ->
>  
> 1. user creates a document
> 2. execute a search against that document with each of these 4000 terms
> 3. use results to suggest tags to the user that are already part of the 
> document, so they don’t have to think of them on their own
>  
> I tried running search:search 4000 times against the one document. It just 
> timed out (which makes sense)
>  
> I know there has to be a better way to do this. Any suggestions?
>  
> Thanks!
>  
> Matt
> _______________________________________________
> General mailing list
> [email protected]
> http://community.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://community.marklogic.com/mailman/listinfo/general

Reply via email to