Hi,

I found the conversation about problems with the stemmer used with English at 
http://lists.oasis-open.org/archives/docbook-apps/201103/msg00040.html very 
informative in tracking down the problem I'm having with the stemmer, which is 
similar. In my case, the word that isn't being stemmed correctly is "relay".(It 
comes out as "relai".) This does break searches: searching for "relay" in a 
document that should have six matches returns an error "Your search returned no 
results for relai".

The solution that I've implemented locally, and offer below for your 
consideration, is a list of words to be stemmed manually. I've tried to follow 
your coding style but I'm not a serious JavaScript hacker so I may have stepped 
on some toes inadvertently.

Regards,
Paul Bort
Systems Engineer
TMW Systems, Inc.
[email protected]

----------------------------------

--- en_stemmer.js
+++ en_stemmer.js
@@ -54,6 +54,14 @@
         meq1 = "^(" + C + ")?" + V + C + "(" + V + ")?$",  // [C]VC[V] is m=1
         mgr1 = "^(" + C + ")?" + V + C + V + C,       // [C]VCVC... is m>1
         s_v = "^(" + C + ")?" + v;                   // vowel in stem
+
+    var exceptionWords = {
+            "relay":"relay",
+            "relaying":"relay",
+            "relays":"relay",
+            "nucleus":"nucleus",
+            "zeus":"zeus"
+        };

     return function (w) {
         var     stem,
@@ -67,6 +75,8 @@

         if (w.length < 3) { return w; }

+        if (w in exceptionWords) { return exceptionWords{w}; }
+
         firstch = w.substr(0,1);
         if (firstch == "y") {
             w = firstch.toUpperCase() + w.substr(1);

Reply via email to