I guess there are a few points - it is impossible to stem with total accuracy using rules alone
- combining a rule based stemmer with a dictionary could also be error prone. Unrelated words can have the same stem - consider the past tense of see and the stem of sawing ( cutting wood ) - Stemming would be even more error prone in Spanish - inflexion in Spanish causes changes to the root more often than in English. Martin Porter goes into a little more detail here : http://snowball.tartarus.org/texts/introduction.html Hope this Helps, Damien > El mar, 24-04-2007 a las 21:49 +0100, [EMAIL PROTECTED] > escribió: >>> >> >> For example, if I search for "eat", I'd like Lucene to find "eating", >> >> "eaten", "ate", etc. >> >> Hi Andrew, >> >> The example you provide can only partially be performed using a rule >> based >> stemmer, such as those uesd by Snowball. Most stemmers are capable of >> stemming eating, eats, and eaten to eat. However they will not stem ate >> to >> eat. >> >> While in theory you could consturuct some form of dictionary to help >> with >> these verbal irregularities, it would be an very complex task. >> > > OK... Hmmm. So then I should assume that, for more complete stemming, > there are no ready-made, easy-to-use dictionaries available under free > licenses? I guess I assumed that there would be, given the prevalence of > free software spelling checkers. Can't the data used by MySpell or the > likes be adapted? Or is it a very different sort of dictionary that > would be needed? > > Thanks, > Andrew > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]