Hi, I found the conversation about problems with the stemmer used with English at http://lists.oasis-open.org/archives/docbook-apps/201103/msg00040.html very informative in tracking down the problem I'm having with the stemmer, which is similar. In my case, the word that isn't being stemmed correctly is "relay".(It comes out as "relai".) This does break searches: searching for "relay" in a document that should have six matches returns an error "Your search returned no results for relai".
The solution that I've implemented locally, and offer below for your consideration, is a list of words to be stemmed manually. I've tried to follow your coding style but I'm not a serious JavaScript hacker so I may have stepped on some toes inadvertently. Regards, Paul Bort Systems Engineer TMW Systems, Inc. [email protected] ---------------------------------- --- en_stemmer.js +++ en_stemmer.js @@ -54,6 +54,14 @@ meq1 = "^(" + C + ")?" + V + C + "(" + V + ")?$", // [C]VC[V] is m=1 mgr1 = "^(" + C + ")?" + V + C + V + C, // [C]VCVC... is m>1 s_v = "^(" + C + ")?" + v; // vowel in stem + + var exceptionWords = { + "relay":"relay", + "relaying":"relay", + "relays":"relay", + "nucleus":"nucleus", + "zeus":"zeus" + }; return function (w) { var stem, @@ -67,6 +75,8 @@ if (w.length < 3) { return w; } + if (w in exceptionWords) { return exceptionWords{w}; } + firstch = w.substr(0,1); if (firstch == "y") { w = firstch.toUpperCase() + w.substr(1);
