Erik Hatcher wrote:
On Feb 24, 2004, at 12:33 PM, Michael McGrady wrote:
This conversation is a mystery to me. Is there some different Porter
stemmer than the one available in the Lucene source code?
Yes. As mentioned, the snowball analyzer family lives in the sandbox.
The CVS repository is jakarta-lucene-sandbox - look under
contributions/snowball for more details. Dr. Porter's website contains
details on why he developed snowball over the original Porter stemmer.
Out of curiosity can anyone comment on how Snowball compares with KStem,
which appeared on the mailing list around this thread:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg03740.html
Also, I thought I read somewhere about new stemmers existing that can
return multiple stems for a word - but on examination neither KStem nor
Snowball seem to fit this description. Memory fault?
Erik
At 09:03 AM 2/24/2004, you wrote:
On Feb 24, 2004, at 10:03 AM, Grant Ingersoll wrote:
Is there any reason why the PorterStemmer can't be made public? I
know several people have submitted this patch, both separately and
as part of other patches. I, for one, am using it in other places
as part of my overall search solution and I bet others are as well.
I guess I could understand if all stemmers were that way, but the
GermanStemmer is publicly available, so it doesn't seem to be
consistent.
Just wondering...
I think we can make it public. But an alternative is to use the
snowball code in the sandbox, which has a public PorterStemmer.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]