I can't share any experiences with K-Stem, but I can share that I do remember 
K-stem people contributing a piece of code that integrated their K-Stem work 
with Lucene a few (2?) years ago.  Their code had some funky license attached, 
so it never made it into Lucene, but it was available for download, so you 
should be able to try both K-stem and Porter and compare.

Otis

----- Original Message ----
From: "Yilmazel, Sibel" <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Mon 13 Feb 2006 01:41:52 PM EST
Subject: Stemmer algorithms

Hello all,

We have done some preliminary research on Porter2 and K-stem algorithms
and have some questions.

Porter2 was found to be a 'strong' stemming algorithm where it strips
off both inflectional suffixes (-s, -es, -ed) and derivational suffixes
(-able, -aciousness, -ability). K-Stem seemed to be a weak stemming
algorithm as it strips off only the inflectional suffixes (-s, -es,
-ed).

In IR, it is usually recommended using a "weak" stemmer, as the "weak"
stemmer seldom hurts performance, but it usually provides significant
improvement with precision.

However, Porter2 is the most widely used stemming algorithm AND it is a
'strong' stemmer which is contrary to what is said above. 

Can you share your ideas, experiences with stemmer algorithms? Thanks in
advance.

Sibel

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to