cts:stem will show the alternative stems, but basic stemming will only use the 
first stem given.
Stemmed search matching depends on matching stem to stem. In basic stemming, 
that means matching on the first stem; in advanced stemming that means matching 
on any of the stems. So, consider your words here:

cts:stem($word,"fr") =>
mourir = mourir
meurt = mourir
mourant = mourir, mourant
mourrait = mourir

Since "mourir" is the first stem for all these, they will all match each other 
under basic stemming.

disparu = disparu, disparaître
disparues = disparu, disparaître
disparaître = disparaître

Since the first stem "disparu" does not match the first stem "disparaître", 
"disparaître" will not match "disparu" under basic stemming although it would 
under advanced stemming.

marche = marche, marcher
marcher = marcher

Since the first stem  "marche" does not match the first stem "marcher", 
"marche" will not match "marcher" under basic stemming although it would under 
advanced stemming.

With respect to "baux" -> "bau"; "bau" is actually a word in French with the 
plural of "baux", although perhaps an obscure word. But even so, in general the 
stemming is a combination of dictionary information and algorithms, and you 
will occasionally turn up cases where you get something that isn't actually a 
word as the stem. But that doesn't really matter: what matters is whether the 
stems match. If a particular word is being stemmed in a way that causes trouble 
for your application, you can always add it to your custom dictionary to force 
a different result.

In general I would say that if basic stemming is not giving you what you want 
in terms of search recall, use advanced stemming. The need is partly dependent 
on the characteristics of the language, and partly on the needs of your 
application. I think French in particular is a language with a lot of words 
that have the same surface forms but different underlying stems, and where the 
shorter stem (which is generally the first) may not be the high probability 
choice, so advanced stemming could make a big difference for some applications.

//Mary


On 04/14/2016 12:58 PM, Gontla Praveen wrote:
Hi Mary,

While testing found more when only basic stemming is enabled.

For example the term "mourir" with basic stemming enabled returns me 
meurt,mourant, mourrait, mourir

let $text:= <text xml:lang="fr">marcher avec la bau rupture de baux septembre 
1997, bail marche cette disparues situation bau fait disparaître la 
justification. Les services fournis disparu par la demanderesse l'ont été dans 
l'attente d'une rémunération,</text>
return
cts:highlight($text,cts:query(<cts:word-query>
                                <cts:text xml:lang="fr">mourir</cts:text>
                                <cts:option>case-insensitive</cts:option>
                                <cts:option>diacritic-insensitive</cts:option>
                                <cts:option>punctuation-insensitive</cts:option>
                      </cts:word-query>),<b>{$cts:text}</b>)

Why does not the same happens for the term disparu  or marche?

Why advanced stemming required for these terms? Is it anything specific to 
French language ?

Also, when i did check for stems of cts:stem("baux","fr") i get bau,bail where 
bau doesnt have any meaning in french.

Since only basic stemming is enabled at my DB level i am seeing documents 
contains baux or bau but not bail.

Can you tell me why this difference in bahaviour on french stems.

Thanks,
Praveen.


_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to