Erik Hatcher wrote:

On Feb 24, 2004, at 12:33 PM, Michael McGrady wrote:

This conversation is a mystery to me. Is there some different Porter stemmer than the one available in the Lucene source code?


Yes. As mentioned, the snowball analyzer family lives in the sandbox. The CVS repository is jakarta-lucene-sandbox - look under contributions/snowball for more details. Dr. Porter's website contains details on why he developed snowball over the original Porter stemmer.

Out of curiosity can anyone comment on how Snowball compares with KStem, which appeared on the mailing list around this thread:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg03740.html



Also, I thought I read somewhere about new stemmers existing that can return multiple stems for a word - but on examination neither KStem nor Snowball seem to fit this description. Memory fault?





Erik



At 09:03 AM 2/24/2004, you wrote:


On Feb 24, 2004, at 10:03 AM, Grant Ingersoll wrote:

Is there any reason why the PorterStemmer can't be made public? I know several people have submitted this patch, both separately and as part of other patches. I, for one, am using it in other places as part of my overall search solution and I bet others are as well. I guess I could understand if all stemmers were that way, but the GermanStemmer is publicly available, so it doesn't seem to be consistent.

Just wondering...


I think we can make it public. But an alternative is to use the snowball code in the sandbox, which has a public PorterStemmer.



--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to