On Mon, 15 Dec 2014 02:31:51 -0800, [email protected] <[email protected]>
wrote:
> Hi
>
> we're trying to build a search that would find all words müller, muller
> and
> mueller using any of the three words. We've got müller and muller
> working as
> expected, but can't get mueller to play nice. (Or other umlauts
> flattened with
> ae, ue oe etc.)
>
> What would be the easiest way to achieve this?
>
> Ville
Stemming is case-sensitive. In this case, the German word is a noun and
therefore is spelled with a capital letter, so "Mueller" will stem to
"Müller" and you would get a match for that. If the content or query is
all lower case, the stemmer doesn't recognize it as a word and so
"mueller" just stems to "mueller" and so you don't get a match.
If this is just an issue for a handful of words (unlikely in this case,
but just in case), you could add "mueller" to your client dictionary so it
stems the way you want it to. Your other option is some kind of query
expansion. For this particular case I think the easiest thing to do would
be to turn a search for "mueller" into a search for "mueller" or "Mueller".
Note that for verbs with umlauts (or other word classes spelled with
lowercase in German) this wouldn't be necessary.
You can check how stemming is handling various words using cts:stem, e.g.
cts:stem("Mueller","de")
//Mary
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general