There's no way to fake these fragments being separate docs to the cts:query?
It just makes it easier on me since I will be pushing (overwriting) this single doc every day using XCC in a different environment. Otherwise I have to deal with the issues of deleting documents that are no longer "valid", etc.. Thanks for all your help again. -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Michael Blakeley Sent: Thursday, May 24, 2012 4:49 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Keyword matching strategy Right, break that up into multiple documents, one per word-query. Otherwise the search on reverse-query will merely tell you whether or not the new document matches the entire or-query. -- Mike On 24 May 2012, at 14:05 , <[email protected]> wrote: > My other doc look like this. Probably this is what I should be using > > <or-query xmlns="http://marklogic.com/cts"> > <word-query> > <text>cows</text> > </word-query> > <word-query> > <text>tigers</text> > </word-query> > <word-query> > <text>bears</text> > </word-query> > <word-query> > <text>10 commandments</text> > </word-query> > <word-query> > <text>awesome</text> > </word-query> > <word-query> > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Broekhuis, Matt > Sent: Thursday, May 24, 2012 4:04 PM > To: [email protected] > Subject: Re: [MarkLogic Dev General] Keyword matching strategy > > If I have one document with all the search terms, how would I do that? > > > <keywordMLList xmlns="http://westlegaledcenter.com/MarkLogicSearch"> > <keywordML> > <keywordId>1</keywordId> > <keywordText>cows</keywordText> > </keywordML> > <keywordML> > <keywordId>2</keywordId> > <keywordText>horsies</keywordText> > </keywordML> > <keywordML> > <keywordId>3</keywordId> > <keywordText>bears</keywordText> > </keywordML> > > > I tried > > return cts:search(doc('http://someURI/keywordList'), cts:reverse-query(text{ > doc('targetDocURI')})) > > > > > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Michael Blakeley > Sent: Thursday, May 24, 2012 3:52 PM > To: MarkLogic Developer Discussion > Subject: Re: [MarkLogic Dev General] Keyword matching strategy > > No, with the reverse-query approach you would instead use around 4000 > separate query documents. This is what I used to generate fake terms for > testing: > > for $i in 1 to 4000 > return xdmp:document-insert( > concat('vocabulary/', $i), > document { cts:word-query(xdmp:integer-to-hex($i)) }) > > I think you said you have multiple vocabularies? You might use different > directory prefixes for different vocabularies. Then you could and-query the > reverse-query with a directory-query term. > > -- Mike > > On 24 May 2012, at 13:42 , <[email protected]> wrote: > >> I just got done with the cts walk and its only taking about 3 or 4 seconds. >> Our documents are not extremely large. >> >> I made a giant or query as an xml document, and passed that in. >> >> I would like to try out the reverse as well. One thing I'm not seeing right >> away, do I still need my big OR-query? >> >> Thank you ! >> >> -----Original Message----- >> From: [email protected] >> [mailto:[email protected]] On Behalf Of Michael >> Blakeley >> Sent: Thursday, May 24, 2012 2:59 PM >> To: MarkLogic Developer Discussion >> Subject: Re: [MarkLogic Dev General] Keyword matching strategy >> >> The cts:walk can take some time too, simply because the query is so large. >> My test took about 30-sec for a 100-kB XML document. This could be capped >> using xdmp:elapsed-time and cts:action. I also found that it could be >> reduced to about 8-sec by rebuilding the XML in a simpler form: >> >> element words { >> for $w in cts:tokenize($new-document)[. instance of cts:word] >> return element word { $w } } >> >> Then I remembered the reverse-query feature. With the fast reverse-query >> index enabled, the lookup could be very efficient. >> >> cts:search( >> xdmp:directory('vocabulary/', 'infinity'), >> cts:reverse-query($new-document)) >> >> Without the reverse-query index, this took about 10-sec for my test >> document. That can be cut to about 3-sec by using a simplified version of >> the document. So it was already faster than cts:walk. >> >> cts:search( >> xdmp:directory('vocabulary/', 'infinity'), >> cts:reverse-query(text { $new-document })) >> >> Enabling the reverse-query index, both versions were sub-second - in fact, >> less than 100-ms, although the text-node version was still 3x faster than >> the marked-up version. Anyway I think reverse-query is the most efficient >> approach, and enabling fast reverse-query searches makes it very fast. >> >> -- Mike >> >> On 24 May 2012, at 10:40 , Will Thompson wrote: >> >>> Matt, >>> >>> I thought of this solution before I saw Mike's post, but this *would* >>> require that the document be inserted first. It leverages the word lexicon, >>> so it should be fairly fast, although it still took a while when I tried >>> something similar using local content. >>> >>> (for $w in >>> cts:words((),(), >>> cts:and-query(( >>> cts:document-query($user-doc-uri), >>> cts:word-query((doc('terms.xml')//term/string())) >>> order by (cts:frequency($w)) >>> retrun $w)[1 to 20] >>> >>> -Will >>> >>> From: [email protected] >>> [mailto:[email protected]] On Behalf >>> [email protected] >>> Sent: Thursday, May 24, 2012 9:05 AM >>> To: [email protected] >>> Subject: [MarkLogic Dev General] Keyword matching strategy >>> >>> I have a requirement where the end user would like to add "tags" to >>> individual documents. >>> >>> I'm maintaining a separate domain specific list of terms which I suggest to >>> the user as potential tags they can select to apply to the document. >>> >>> This list of terms is around 4000 items long. And it will continue to grow. >>> >>> What I want to do -> >>> >>> 1. user creates a document >>> 2. execute a search against that document with each of these 4000 terms >>> 3. use results to suggest tags to the user that are already part of the >>> document, so they don't have to think of them on their own >>> >>> I tried running search:search 4000 times against the one document. It just >>> timed out (which makes sense) >>> >>> I know there has to be a better way to do this. Any suggestions? >>> >>> Thanks! >>> >>> Matt >>> _______________________________________________ >>> General mailing list >>> [email protected] >>> http://community.marklogic.com/mailman/listinfo/general >> >> _______________________________________________ >> General mailing list >> [email protected] >> http://community.marklogic.com/mailman/listinfo/general >> _______________________________________________ >> General mailing list >> [email protected] >> http://community.marklogic.com/mailman/listinfo/general >> > > _______________________________________________ > General mailing list > [email protected] > http://community.marklogic.com/mailman/listinfo/general > _______________________________________________ > General mailing list > [email protected] > http://community.marklogic.com/mailman/listinfo/general > _______________________________________________ > General mailing list > [email protected] > http://community.marklogic.com/mailman/listinfo/general > _______________________________________________ General mailing list [email protected] http://community.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://community.marklogic.com/mailman/listinfo/general
