The basic approach is to expand your search to search across the languages you 
are interested in.  For example, if a user enters a search term:

cat chat

and your content is in English and French, then you can expand into the 
following cts:query:

cts:or-query((
  cts:and-query((cts:word-query("cat", "lang=en"),  
                 cts:word-query("chat", "lang=en"))),
  cts:and-query((cts:word-query("cat", "lang=fr"),  
                 cts:word-query("chat", "lang=fr")))
))

It is up to you how you decide to parse the user input.

-Danny

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Whitby, Rob, CMG
Sent: Tuesday, February 10, 2009 9:08 AM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] stemmed searches

Can anyone help me with this issue? What is the best way to deal with content 
in multiple languages?

Thanks
Rob


-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Whitby, Rob, CMG
Sent: 06 February 2009 11:41
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] stemmed searches

Thanks for the replies.

I'm using 4.0-1 on 64-bit Windows 2003 Server

I think it is a language thing. Setting the lang option in the stemmed query 
does change the number of results. I'm surprised that stemming has the effect 
of limiting the search to one language, I expected it would still run the 
search on content in other languages but the stemming wouldn't be of any help. 
Even better would be if the stemming was dynamic based on the content language.

The consequences are worrying for general searching. I have content in multiple 
languages and would like the user to be able to enter search terms and receive 
results in any language. Is the only way to fix this to turn off stemming?

I guess I could set the xml:lang attribute to "en" for every article...

Thanks
Rob



-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Mary Holstege
Sent: 05 February 2009 20:13
To: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] stemmed searches

On Thu, 05 Feb 2009 09:58:19 -0800, Michael Blakeley 
<[email protected]> wrote:

> Rob,
>
> It's always a good idea to state which server release you are using, 
> and on which OS.
>
> The behavior you've observed doesn't look right to me, but I couldn't 
> easily reproduce it either. That suggests that something 
> content-specific or version-specific is at work: if you have a support 
> contract, I'd suggest that you contact support.

One possibility:

Stemmed searches search within a particular language, in this case the default, 
most likely English.  If for some reason the element in question is in some 
other language (e.g. an xml:lang="fr" on the Article element), then that "2009" 
would be in some other language, and therefore wouldn't show up on a stemmed 
English word-query.

//Mary

>
> Meanwhile, you might try some other approaches. Would
> cts:element-value-query() be appropriate for this use-case? Or perhaps 
> a simple XPath?
>
>    /Journal/Volume/Issue/Article/PublishDate/Year[. eq 2009]
>
> If a word-query is what you want, it would be more efficient to write 
> this as an element-word-query:
>
>   cts:search(
>     /Journal/Volume/Issue/Article/PublishDate,
>     cts:element-word-query(xs:QName('Year'), "2009", ("unstemmed"), 1)
>   )
>
> thanks,
> -- Mike
>
> On 2009-02-05 07:14, Whitby, Rob, CMG wrote:
>> Can anyone explain why these 2 queries return different results?
>>
>> count(
>>    cts:search(
>>      /Journal/Volume/Issue/Article/PublishDate/Year,
>>      cts:word-query("2009", ("unstemmed"), 1)
>>    )
>> )
>>
>> = 3036 (the correct result)
>>
>> count(
>>    cts:search(
>>      /Journal/Volume/Issue/Article/PublishDate/Year,
>>      cts:word-query("2009", ("stemmed"), 1)
>>    )
>> )
>>
>> = 2757
>>
>> Why is the "stemmed" setting causing some matches to be missed?
>
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general


_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to