My other doc look like this. Probably this is what I should be using <or-query xmlns="http://marklogic.com/cts"> <word-query> <text>cows</text> </word-query> <word-query> <text>tigers</text> </word-query> <word-query> <text>bears</text> </word-query> <word-query> <text>10 commandments</text> </word-query> <word-query> <text>awesome</text> </word-query> <word-query>
-----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Broekhuis, Matt Sent: Thursday, May 24, 2012 4:04 PM To: [email protected] Subject: Re: [MarkLogic Dev General] Keyword matching strategy If I have one document with all the search terms, how would I do that? <keywordMLList xmlns="http://westlegaledcenter.com/MarkLogicSearch"> <keywordML> <keywordId>1</keywordId> <keywordText>cows</keywordText> </keywordML> <keywordML> <keywordId>2</keywordId> <keywordText>horsies</keywordText> </keywordML> <keywordML> <keywordId>3</keywordId> <keywordText>bears</keywordText> </keywordML> I tried return cts:search(doc('http://someURI/keywordList'), cts:reverse-query(text{ doc('targetDocURI')})) -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Michael Blakeley Sent: Thursday, May 24, 2012 3:52 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Keyword matching strategy No, with the reverse-query approach you would instead use around 4000 separate query documents. This is what I used to generate fake terms for testing: for $i in 1 to 4000 return xdmp:document-insert( concat('vocabulary/', $i), document { cts:word-query(xdmp:integer-to-hex($i)) }) I think you said you have multiple vocabularies? You might use different directory prefixes for different vocabularies. Then you could and-query the reverse-query with a directory-query term. -- Mike On 24 May 2012, at 13:42 , <[email protected]> wrote: > I just got done with the cts walk and its only taking about 3 or 4 seconds. > Our documents are not extremely large. > > I made a giant or query as an xml document, and passed that in. > > I would like to try out the reverse as well. One thing I'm not seeing right > away, do I still need my big OR-query? > > Thank you ! > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Michael Blakeley > Sent: Thursday, May 24, 2012 2:59 PM > To: MarkLogic Developer Discussion > Subject: Re: [MarkLogic Dev General] Keyword matching strategy > > The cts:walk can take some time too, simply because the query is so large. My > test took about 30-sec for a 100-kB XML document. This could be capped using > xdmp:elapsed-time and cts:action. I also found that it could be reduced to > about 8-sec by rebuilding the XML in a simpler form: > > element words { > for $w in cts:tokenize($new-document)[. instance of cts:word] > return element word { $w } } > > Then I remembered the reverse-query feature. With the fast reverse-query > index enabled, the lookup could be very efficient. > > cts:search( > xdmp:directory('vocabulary/', 'infinity'), > cts:reverse-query($new-document)) > > Without the reverse-query index, this took about 10-sec for my test document. > That can be cut to about 3-sec by using a simplified version of the document. > So it was already faster than cts:walk. > > cts:search( > xdmp:directory('vocabulary/', 'infinity'), > cts:reverse-query(text { $new-document })) > > Enabling the reverse-query index, both versions were sub-second - in fact, > less than 100-ms, although the text-node version was still 3x faster than the > marked-up version. Anyway I think reverse-query is the most efficient > approach, and enabling fast reverse-query searches makes it very fast. > > -- Mike > > On 24 May 2012, at 10:40 , Will Thompson wrote: > >> Matt, >> >> I thought of this solution before I saw Mike's post, but this *would* >> require that the document be inserted first. It leverages the word lexicon, >> so it should be fairly fast, although it still took a while when I tried >> something similar using local content. >> >> (for $w in >> cts:words((),(), >> cts:and-query(( >> cts:document-query($user-doc-uri), >> cts:word-query((doc('terms.xml')//term/string())) >> order by (cts:frequency($w)) >> retrun $w)[1 to 20] >> >> -Will >> >> From: [email protected] >> [mailto:[email protected]] On Behalf >> [email protected] >> Sent: Thursday, May 24, 2012 9:05 AM >> To: [email protected] >> Subject: [MarkLogic Dev General] Keyword matching strategy >> >> I have a requirement where the end user would like to add "tags" to >> individual documents. >> >> I'm maintaining a separate domain specific list of terms which I suggest to >> the user as potential tags they can select to apply to the document. >> >> This list of terms is around 4000 items long. And it will continue to grow. >> >> What I want to do -> >> >> 1. user creates a document >> 2. execute a search against that document with each of these 4000 terms >> 3. use results to suggest tags to the user that are already part of the >> document, so they don't have to think of them on their own >> >> I tried running search:search 4000 times against the one document. It just >> timed out (which makes sense) >> >> I know there has to be a better way to do this. Any suggestions? >> >> Thanks! >> >> Matt >> _______________________________________________ >> General mailing list >> [email protected] >> http://community.marklogic.com/mailman/listinfo/general > > _______________________________________________ > General mailing list > [email protected] > http://community.marklogic.com/mailman/listinfo/general > _______________________________________________ > General mailing list > [email protected] > http://community.marklogic.com/mailman/listinfo/general > _______________________________________________ General mailing list [email protected] http://community.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://community.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://community.marklogic.com/mailman/listinfo/general
