Thanks for the replies. I'm using 4.0-1 on 64-bit Windows 2003 Server
I think it is a language thing. Setting the lang option in the stemmed query does change the number of results. I'm surprised that stemming has the effect of limiting the search to one language, I expected it would still run the search on content in other languages but the stemming wouldn't be of any help. Even better would be if the stemming was dynamic based on the content language. The consequences are worrying for general searching. I have content in multiple languages and would like the user to be able to enter search terms and receive results in any language. Is the only way to fix this to turn off stemming? I guess I could set the xml:lang attribute to "en" for every article... Thanks Rob -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Mary Holstege Sent: 05 February 2009 20:13 To: General Mark Logic Developer Discussion Subject: Re: [MarkLogic Dev General] stemmed searches On Thu, 05 Feb 2009 09:58:19 -0800, Michael Blakeley <[email protected]> wrote: > Rob, > > It's always a good idea to state which server release you are using, and > on which OS. > > The behavior you've observed doesn't look right to me, but I couldn't > easily reproduce it either. That suggests that something > content-specific or version-specific is at work: if you have a support > contract, I'd suggest that you contact support. One possibility: Stemmed searches search within a particular language, in this case the default, most likely English. If for some reason the element in question is in some other language (e.g. an xml:lang="fr" on the Article element), then that "2009" would be in some other language, and therefore wouldn't show up on a stemmed English word-query. //Mary > > Meanwhile, you might try some other approaches. Would > cts:element-value-query() be appropriate for this use-case? Or perhaps a > simple XPath? > > /Journal/Volume/Issue/Article/PublishDate/Year[. eq 2009] > > If a word-query is what you want, it would be more efficient to write > this as an element-word-query: > > cts:search( > /Journal/Volume/Issue/Article/PublishDate, > cts:element-word-query(xs:QName('Year'), "2009", ("unstemmed"), 1) > ) > > thanks, > -- Mike > > On 2009-02-05 07:14, Whitby, Rob, CMG wrote: >> Can anyone explain why these 2 queries return different results? >> >> count( >> cts:search( >> /Journal/Volume/Issue/Article/PublishDate/Year, >> cts:word-query("2009", ("unstemmed"), 1) >> ) >> ) >> >> = 3036 (the correct result) >> >> count( >> cts:search( >> /Journal/Volume/Issue/Article/PublishDate/Year, >> cts:word-query("2009", ("stemmed"), 1) >> ) >> ) >> >> = 2757 >> >> Why is the "stemmed" setting causing some matches to be missed? > > _______________________________________________ > General mailing list > [email protected] > http://xqzone.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
_______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
