I think it would be cleaner to use a directory prefix in the document URIs with the date encoded in it, something like 'vocabulary/2012-05-24/'. You can insert the new queries into today's directory, run any tests, switch the production configuration to today's directory, and finally xdmp:directory-delete the old ones. You might even leave the old ones around for a day or two, in case you spot problems and need to roll back.
But I was curious about fragments so I looked into it. First, http://docs.marklogic.com/5.0doc/docapp.xqy#display.xqy?fname=http://pubs/5.0doc/xml/search-dev-guide/alerts.xml does not say anything about fragments. So the behavior is undocumented and might not be reliable, especially across releases. But this works for me with a fragment root on 'vocabulary-term'. For production use I think a namespace would be a good idea. I would *not* try to fragment on cts:word-query, since that is very likely to cause problems down the road. xdmp:document-insert( 'vocabulary/1', element vocabulary { for $i in 1 to 4000 return element vocabulary-term { cts:word-query(xdmp:integer-to-hex($i)) }}) => () cts:search( xdmp:directory('vocabulary/', 'infinity')//vocabulary-term, cts:reverse-query(text { 'caf' } )) => <vocabulary-term> <cts:word-query xmlns:cts="http://marklogic.com/cts"> <cts:text xml:lang="en">caf</cts:text> </cts:word-query> </vocabulary-term> As far as I can tell from the query-meters output, this uses the reverse-query index. Again, the docs aren't clear on whether or not this should work, so it may not be supported for production use. I would check with support before relying on it. This approach may also be a shade slower than the multidocument approach. If you don't have the reverse-query index, it could even be slower than cts:walk. -- Mike On 24 May 2012, at 14:53 , <[email protected]> wrote: > There's no way to fake these fragments being separate docs to the cts:query? > > It just makes it easier on me since I will be pushing (overwriting) this > single doc every day using XCC in a different environment. Otherwise I have > to deal with the issues of deleting documents that are no longer "valid", > etc.. > > Thanks for all your help again. > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Michael Blakeley > Sent: Thursday, May 24, 2012 4:49 PM > To: MarkLogic Developer Discussion > Subject: Re: [MarkLogic Dev General] Keyword matching strategy > > Right, break that up into multiple documents, one per word-query. Otherwise > the search on reverse-query will merely tell you whether or not the new > document matches the entire or-query. > > -- Mike > > On 24 May 2012, at 14:05 , <[email protected]> wrote: > >> My other doc look like this. Probably this is what I should be using >> >> <or-query xmlns="http://marklogic.com/cts"> >> <word-query> >> <text>cows</text> >> </word-query> >> <word-query> >> <text>tigers</text> >> </word-query> >> <word-query> >> <text>bears</text> >> </word-query> >> <word-query> >> <text>10 commandments</text> >> </word-query> >> <word-query> >> <text>awesome</text> >> </word-query> >> <word-query> >> >> -----Original Message----- >> From: [email protected] >> [mailto:[email protected]] On Behalf Of Broekhuis, Matt >> Sent: Thursday, May 24, 2012 4:04 PM >> To: [email protected] >> Subject: Re: [MarkLogic Dev General] Keyword matching strategy >> >> If I have one document with all the search terms, how would I do that? >> >> >> <keywordMLList xmlns="http://westlegaledcenter.com/MarkLogicSearch"> >> <keywordML> >> <keywordId>1</keywordId> >> <keywordText>cows</keywordText> >> </keywordML> >> <keywordML> >> <keywordId>2</keywordId> >> <keywordText>horsies</keywordText> >> </keywordML> >> <keywordML> >> <keywordId>3</keywordId> >> <keywordText>bears</keywordText> >> </keywordML> >> >> >> I tried >> >> return cts:search(doc('http://someURI/keywordList'), cts:reverse-query(text{ >> doc('targetDocURI')})) >> >> >> >> >> >> -----Original Message----- >> From: [email protected] >> [mailto:[email protected]] On Behalf Of Michael >> Blakeley >> Sent: Thursday, May 24, 2012 3:52 PM >> To: MarkLogic Developer Discussion >> Subject: Re: [MarkLogic Dev General] Keyword matching strategy >> >> No, with the reverse-query approach you would instead use around 4000 >> separate query documents. This is what I used to generate fake terms for >> testing: >> >> for $i in 1 to 4000 >> return xdmp:document-insert( >> concat('vocabulary/', $i), >> document { cts:word-query(xdmp:integer-to-hex($i)) }) >> >> I think you said you have multiple vocabularies? You might use different >> directory prefixes for different vocabularies. Then you could and-query the >> reverse-query with a directory-query term. >> >> -- Mike >> >> On 24 May 2012, at 13:42 , <[email protected]> wrote: >> >>> I just got done with the cts walk and its only taking about 3 or 4 seconds. >>> Our documents are not extremely large. >>> >>> I made a giant or query as an xml document, and passed that in. >>> >>> I would like to try out the reverse as well. One thing I'm not seeing right >>> away, do I still need my big OR-query? >>> >>> Thank you ! >>> >>> -----Original Message----- >>> From: [email protected] >>> [mailto:[email protected]] On Behalf Of Michael >>> Blakeley >>> Sent: Thursday, May 24, 2012 2:59 PM >>> To: MarkLogic Developer Discussion >>> Subject: Re: [MarkLogic Dev General] Keyword matching strategy >>> >>> The cts:walk can take some time too, simply because the query is so large. >>> My test took about 30-sec for a 100-kB XML document. This could be capped >>> using xdmp:elapsed-time and cts:action. I also found that it could be >>> reduced to about 8-sec by rebuilding the XML in a simpler form: >>> >>> element words { >>> for $w in cts:tokenize($new-document)[. instance of cts:word] >>> return element word { $w } } >>> >>> Then I remembered the reverse-query feature. With the fast reverse-query >>> index enabled, the lookup could be very efficient. >>> >>> cts:search( >>> xdmp:directory('vocabulary/', 'infinity'), >>> cts:reverse-query($new-document)) >>> >>> Without the reverse-query index, this took about 10-sec for my test >>> document. That can be cut to about 3-sec by using a simplified version of >>> the document. So it was already faster than cts:walk. >>> >>> cts:search( >>> xdmp:directory('vocabulary/', 'infinity'), >>> cts:reverse-query(text { $new-document })) >>> >>> Enabling the reverse-query index, both versions were sub-second - in fact, >>> less than 100-ms, although the text-node version was still 3x faster than >>> the marked-up version. Anyway I think reverse-query is the most efficient >>> approach, and enabling fast reverse-query searches makes it very fast. >>> >>> -- Mike >>> >>> On 24 May 2012, at 10:40 , Will Thompson wrote: >>> >>>> Matt, >>>> >>>> I thought of this solution before I saw Mike's post, but this *would* >>>> require that the document be inserted first. It leverages the word >>>> lexicon, so it should be fairly fast, although it still took a while when >>>> I tried something similar using local content. >>>> >>>> (for $w in >>>> cts:words((),(), >>>> cts:and-query(( >>>> cts:document-query($user-doc-uri), >>>> cts:word-query((doc('terms.xml')//term/string())) >>>> order by (cts:frequency($w)) >>>> retrun $w)[1 to 20] >>>> >>>> -Will >>>> >>>> From: [email protected] >>>> [mailto:[email protected]] On Behalf >>>> [email protected] >>>> Sent: Thursday, May 24, 2012 9:05 AM >>>> To: [email protected] >>>> Subject: [MarkLogic Dev General] Keyword matching strategy >>>> >>>> I have a requirement where the end user would like to add "tags" to >>>> individual documents. >>>> >>>> I'm maintaining a separate domain specific list of terms which I suggest >>>> to the user as potential tags they can select to apply to the document. >>>> >>>> This list of terms is around 4000 items long. And it will continue to grow. >>>> >>>> What I want to do -> >>>> >>>> 1. user creates a document >>>> 2. execute a search against that document with each of these 4000 terms >>>> 3. use results to suggest tags to the user that are already part of the >>>> document, so they don't have to think of them on their own >>>> >>>> I tried running search:search 4000 times against the one document. It just >>>> timed out (which makes sense) >>>> >>>> I know there has to be a better way to do this. Any suggestions? >>>> >>>> Thanks! >>>> >>>> Matt >>>> _______________________________________________ >>>> General mailing list >>>> [email protected] >>>> http://community.marklogic.com/mailman/listinfo/general >>> >>> _______________________________________________ >>> General mailing list >>> [email protected] >>> http://community.marklogic.com/mailman/listinfo/general >>> _______________________________________________ >>> General mailing list >>> [email protected] >>> http://community.marklogic.com/mailman/listinfo/general >>> >> >> _______________________________________________ >> General mailing list >> [email protected] >> http://community.marklogic.com/mailman/listinfo/general >> _______________________________________________ >> General mailing list >> [email protected] >> http://community.marklogic.com/mailman/listinfo/general >> _______________________________________________ >> General mailing list >> [email protected] >> http://community.marklogic.com/mailman/listinfo/general >> > > _______________________________________________ > General mailing list > [email protected] > http://community.marklogic.com/mailman/listinfo/general > _______________________________________________ > General mailing list > [email protected] > http://community.marklogic.com/mailman/listinfo/general > _______________________________________________ General mailing list [email protected] http://community.marklogic.com/mailman/listinfo/general
