Hi I'm still facing this problem.
I'm using lib-search and the idea of editing it's queries to include and-queries for every possible language isn't viable. For a start I have no way of knowing all the possible languages my content may be in. Disabling stemmed searches isn't an option because it is one of the key features we rely on. I have to be able to used stemmed searches for English content, and at the same time return matches from content in other languages. So... here's my current plan, and I'd appreciate feedback on whether there's a better solution: Remove all xml:lang attributes from all content. Replace with a custom meta tag, something like <meta:Lang>de</meta:Lang>, so that we don't lose the language info but MarkLogic doesn't auto-detect it. I don’t like this solution but can't think of anything else. Personally I think this is a poor feature of MarkLogic. Turning stemming on/off should not affect the content base searched. Everything should be searched, with content in the configured language gaining the benefits of stemming. Any comments/suggestions would be really welcome! Thank you Rob -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Danny Sokolsky Sent: 10 February 2009 18:45 To: General Mark Logic Developer Discussion Subject: RE: [MarkLogic Dev General] stemmed searches The basic approach is to expand your search to search across the languages you are interested in. For example, if a user enters a search term: cat chat and your content is in English and French, then you can expand into the following cts:query: cts:or-query(( cts:and-query((cts:word-query("cat", "lang=en"), cts:word-query("chat", "lang=en"))), cts:and-query((cts:word-query("cat", "lang=fr"), cts:word-query("chat", "lang=fr"))) )) It is up to you how you decide to parse the user input. -Danny -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Whitby, Rob, CMG Sent: Tuesday, February 10, 2009 9:08 AM To: General Mark Logic Developer Discussion Subject: RE: [MarkLogic Dev General] stemmed searches Can anyone help me with this issue? What is the best way to deal with content in multiple languages? Thanks Rob -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Whitby, Rob, CMG Sent: 06 February 2009 11:41 To: General Mark Logic Developer Discussion Subject: RE: [MarkLogic Dev General] stemmed searches Thanks for the replies. I'm using 4.0-1 on 64-bit Windows 2003 Server I think it is a language thing. Setting the lang option in the stemmed query does change the number of results. I'm surprised that stemming has the effect of limiting the search to one language, I expected it would still run the search on content in other languages but the stemming wouldn't be of any help. Even better would be if the stemming was dynamic based on the content language. The consequences are worrying for general searching. I have content in multiple languages and would like the user to be able to enter search terms and receive results in any language. Is the only way to fix this to turn off stemming? I guess I could set the xml:lang attribute to "en" for every article... Thanks Rob -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Mary Holstege Sent: 05 February 2009 20:13 To: General Mark Logic Developer Discussion Subject: Re: [MarkLogic Dev General] stemmed searches On Thu, 05 Feb 2009 09:58:19 -0800, Michael Blakeley <[email protected]> wrote: > Rob, > > It's always a good idea to state which server release you are using, > and on which OS. > > The behavior you've observed doesn't look right to me, but I couldn't > easily reproduce it either. That suggests that something > content-specific or version-specific is at work: if you have a support > contract, I'd suggest that you contact support. One possibility: Stemmed searches search within a particular language, in this case the default, most likely English. If for some reason the element in question is in some other language (e.g. an xml:lang="fr" on the Article element), then that "2009" would be in some other language, and therefore wouldn't show up on a stemmed English word-query. //Mary > > Meanwhile, you might try some other approaches. Would > cts:element-value-query() be appropriate for this use-case? Or perhaps > a simple XPath? > > /Journal/Volume/Issue/Article/PublishDate/Year[. eq 2009] > > If a word-query is what you want, it would be more efficient to write > this as an element-word-query: > > cts:search( > /Journal/Volume/Issue/Article/PublishDate, > cts:element-word-query(xs:QName('Year'), "2009", ("unstemmed"), 1) > ) > > thanks, > -- Mike > > On 2009-02-05 07:14, Whitby, Rob, CMG wrote: >> Can anyone explain why these 2 queries return different results? >> >> count( >> cts:search( >> /Journal/Volume/Issue/Article/PublishDate/Year, >> cts:word-query("2009", ("unstemmed"), 1) >> ) >> ) >> >> = 3036 (the correct result) >> >> count( >> cts:search( >> /Journal/Volume/Issue/Article/PublishDate/Year, >> cts:word-query("2009", ("stemmed"), 1) >> ) >> ) >> >> = 2757 >> >> Why is the "stemmed" setting causing some matches to be missed? > > _______________________________________________ > General mailing list > [email protected] > http://xqzone.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
_______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
