I have more questions about stemming. The query:

let $x := <text xml:lang="fr">sont es ès</text>,
$query1 := cts:word-query("être", ("lang=fr")),
$query2 := cts:word-query("suis", ("lang=fr"))
return (
cts:highlight($x, $query1, element hit {$cts:text}),
cts:highlight($x, $query2, element hit {$cts:text})
)

produces the results:

<text xml:lang="fr"><hit>sont</hit> <hit>es</hit> ès</text>
<text xml:lang="fr"><hit>sont</hit> <hit>es</hit> <hit>ès</hit></text>

This seems to indicate that stemmed results get their diacritic-sensitive value for stemmed parts from the presence or absence of diacritics of the original search term. This seems incorrect, since the stemmer in theory has the correct diacritics for the stemmed parts. In this case in particular, ès is completely unrelated to être. Is this behavior we can affect on a database level or in some other way independent of specifying "diacritic-sensitive" for the base query?
Marc Moskowitz
Interactive Factory

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to