Forgot to send this to the list, sorry about that.

In addition to what is below, one more question: can we do multilanguage 
searches? We have content in multiple languages, and the end user is 
searching with keywords or phrases, and we'd like to hit all content, 
even if we specify certain language for the query (for stemming 
purposes). Is this possible? Or do we need to do a query, containing all 
the terms / phrases in all the languages we support to get all the 
possible results?

(The other question, pasted here from below was "I'll confirm once more 
that there is no way to override in configuration that we'd want the 
stemming to be case insensitive, is that right?")

Ville

------ Original Message ------
From: [email protected]
To: "Mary Holstege" <[email protected]>
Sent: 16.12.2014 8:48:54
Subject: Re[2]: [MarkLogic Dev General] Stemming/diacritics

>  Hi Mary,
>
>thank you for the help, we'll have to evaluate our options with the 
>client and decide how to tackle this one. I'll confirm once more that 
>there is no way to override in configuration that we'd want the 
>stemming to be case insensitive, is that right? (We know it would 
>result in false positives.)
>
>Ville
>
>------ Original Message ------
>From: "Mary Holstege" <[email protected]>
>To: "MarkLogic Developer Discussion" <[email protected]>; 
>"[email protected]" <[email protected]>
>Sent: 15.12.2014 17:33:36
>Subject: Re: [MarkLogic Dev General] Stemming/diacritics
>
>>On Mon, 15 Dec 2014 02:31:51 -0800, [email protected] 
>><[email protected]> wrote:
>>
>>>Hi
>>>
>>>we're trying to build a search that would find all words müller, 
>>>muller and
>>>mueller using any of the three words. We've got müller and muller 
>>>working as
>>>expected, but can't get mueller to play nice. (Or other umlauts 
>>>flattened with
>>>ae, ue oe etc.)
>>>
>>>What would be the easiest way to achieve this?
>>>
>>>Ville
>>
>>Stemming is case-sensitive. In this case, the German word is a noun 
>>and therefore is spelled with a capital letter, so "Mueller" will stem 
>>to "Müller" and you would get a match for that. If the content or 
>>query is all lower case, the stemmer doesn't recognize it as a word 
>>and so "mueller" just stems to "mueller" and so you don't get a match.
>>
>>If this is just an issue for a handful of words (unlikely in this 
>>case, but just in case), you could add "mueller" to your client 
>>dictionary so it stems the way you want it to. Your other option is 
>>some kind of query expansion. For this particular case I think the 
>>easiest thing to do would be to turn a search for "mueller" into a 
>>search for "mueller" or "Mueller".
>>
>>Note that for verbs with umlauts (or other word classes spelled with 
>>lowercase in German) this wouldn't be necessary.
>>
>>You can check how stemming is handling various words using cts:stem, 
>>e.g.
>>
>>cts:stem("Mueller","de")
>>
>>//Mary

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to