No, with the reverse-query approach you would instead use around 4000 separate
query documents. This is what I used to generate fake terms for testing:
for $i in 1 to 4000
return xdmp:document-insert(
concat('vocabulary/', $i),
document { cts:word-query(xdmp:integer-to-hex($i)) })
I think you said you have multiple vocabularies? You might use different
directory prefixes for different vocabularies. Then you could and-query the
reverse-query with a directory-query term.
-- Mike
On 24 May 2012, at 13:42 , <[email protected]> wrote:
> I just got done with the cts walk and its only taking about 3 or 4 seconds.
> Our documents are not extremely large.
>
> I made a giant or query as an xml document, and passed that in.
>
> I would like to try out the reverse as well. One thing I'm not seeing right
> away, do I still need my big OR-query?
>
> Thank you !
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Michael Blakeley
> Sent: Thursday, May 24, 2012 2:59 PM
> To: MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] Keyword matching strategy
>
> The cts:walk can take some time too, simply because the query is so large. My
> test took about 30-sec for a 100-kB XML document. This could be capped using
> xdmp:elapsed-time and cts:action. I also found that it could be reduced to
> about 8-sec by rebuilding the XML in a simpler form:
>
> element words {
> for $w in cts:tokenize($new-document)[. instance of cts:word]
> return element word { $w } }
>
> Then I remembered the reverse-query feature. With the fast reverse-query
> index enabled, the lookup could be very efficient.
>
> cts:search(
> xdmp:directory('vocabulary/', 'infinity'),
> cts:reverse-query($new-document))
>
> Without the reverse-query index, this took about 10-sec for my test document.
> That can be cut to about 3-sec by using a simplified version of the document.
> So it was already faster than cts:walk.
>
> cts:search(
> xdmp:directory('vocabulary/', 'infinity'),
> cts:reverse-query(text { $new-document }))
>
> Enabling the reverse-query index, both versions were sub-second - in fact,
> less than 100-ms, although the text-node version was still 3x faster than the
> marked-up version. Anyway I think reverse-query is the most efficient
> approach, and enabling fast reverse-query searches makes it very fast.
>
> -- Mike
>
> On 24 May 2012, at 10:40 , Will Thompson wrote:
>
>> Matt,
>>
>> I thought of this solution before I saw Mike's post, but this *would*
>> require that the document be inserted first. It leverages the word lexicon,
>> so it should be fairly fast, although it still took a while when I tried
>> something similar using local content.
>>
>> (for $w in
>> cts:words((),(),
>> cts:and-query((
>> cts:document-query($user-doc-uri),
>> cts:word-query((doc('terms.xml')//term/string()))
>> order by (cts:frequency($w))
>> retrun $w)[1 to 20]
>>
>> -Will
>>
>> From: [email protected]
>> [mailto:[email protected]] On Behalf
>> [email protected]
>> Sent: Thursday, May 24, 2012 9:05 AM
>> To: [email protected]
>> Subject: [MarkLogic Dev General] Keyword matching strategy
>>
>> I have a requirement where the end user would like to add "tags" to
>> individual documents.
>>
>> I'm maintaining a separate domain specific list of terms which I suggest to
>> the user as potential tags they can select to apply to the document.
>>
>> This list of terms is around 4000 items long. And it will continue to grow.
>>
>> What I want to do ->
>>
>> 1. user creates a document
>> 2. execute a search against that document with each of these 4000 terms
>> 3. use results to suggest tags to the user that are already part of the
>> document, so they don't have to think of them on their own
>>
>> I tried running search:search 4000 times against the one document. It just
>> timed out (which makes sense)
>>
>> I know there has to be a better way to do this. Any suggestions?
>>
>> Thanks!
>>
>> Matt
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://community.marklogic.com/mailman/listinfo/general
>
> _______________________________________________
> General mailing list
> [email protected]
> http://community.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://community.marklogic.com/mailman/listinfo/general
>
_______________________________________________
General mailing list
[email protected]
http://community.marklogic.com/mailman/listinfo/general