I just got done with the cts walk and its only taking about 3 or 4 seconds. Our documents are not extremely large.
I made a giant or query as an xml document, and passed that in. I would like to try out the reverse as well. One thing I'm not seeing right away, do I still need my big OR-query? Thank you ! -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Michael Blakeley Sent: Thursday, May 24, 2012 2:59 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Keyword matching strategy The cts:walk can take some time too, simply because the query is so large. My test took about 30-sec for a 100-kB XML document. This could be capped using xdmp:elapsed-time and cts:action. I also found that it could be reduced to about 8-sec by rebuilding the XML in a simpler form: element words { for $w in cts:tokenize($new-document)[. instance of cts:word] return element word { $w } } Then I remembered the reverse-query feature. With the fast reverse-query index enabled, the lookup could be very efficient. cts:search( xdmp:directory('vocabulary/', 'infinity'), cts:reverse-query($new-document)) Without the reverse-query index, this took about 10-sec for my test document. That can be cut to about 3-sec by using a simplified version of the document. So it was already faster than cts:walk. cts:search( xdmp:directory('vocabulary/', 'infinity'), cts:reverse-query(text { $new-document })) Enabling the reverse-query index, both versions were sub-second - in fact, less than 100-ms, although the text-node version was still 3x faster than the marked-up version. Anyway I think reverse-query is the most efficient approach, and enabling fast reverse-query searches makes it very fast. -- Mike On 24 May 2012, at 10:40 , Will Thompson wrote: > Matt, > > I thought of this solution before I saw Mike's post, but this *would* require > that the document be inserted first. It leverages the word lexicon, so it > should be fairly fast, although it still took a while when I tried something > similar using local content. > > (for $w in > cts:words((),(), > cts:and-query(( > cts:document-query($user-doc-uri), > cts:word-query((doc('terms.xml')//term/string())) > order by (cts:frequency($w)) > retrun $w)[1 to 20] > > -Will > > From: [email protected] > [mailto:[email protected]] On Behalf > [email protected] > Sent: Thursday, May 24, 2012 9:05 AM > To: [email protected] > Subject: [MarkLogic Dev General] Keyword matching strategy > > I have a requirement where the end user would like to add "tags" to > individual documents. > > I'm maintaining a separate domain specific list of terms which I suggest to > the user as potential tags they can select to apply to the document. > > This list of terms is around 4000 items long. And it will continue to grow. > > What I want to do -> > > 1. user creates a document > 2. execute a search against that document with each of these 4000 terms > 3. use results to suggest tags to the user that are already part of the > document, so they don't have to think of them on their own > > I tried running search:search 4000 times against the one document. It just > timed out (which makes sense) > > I know there has to be a better way to do this. Any suggestions? > > Thanks! > > Matt > _______________________________________________ > General mailing list > [email protected] > http://community.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://community.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://community.marklogic.com/mailman/listinfo/general
