I guess there are a few points

- it is impossible to stem with total accuracy using rules alone

- combining a rule based stemmer with a dictionary could also be error
prone. Unrelated words can have the same stem - consider the past tense of
see and the stem of sawing ( cutting wood )

- Stemming would be even more error prone in Spanish - inflexion in
Spanish causes changes to the root more often than in English.

Martin Porter goes into a little more detail here :

http://snowball.tartarus.org/texts/introduction.html

Hope this Helps,

Damien

> El mar, 24-04-2007 a las 21:49 +0100, [EMAIL PROTECTED]
> escribió:
>>>
>> >> For example, if I search for "eat", I'd like Lucene to find "eating",
>> >> "eaten", "ate", etc.
>>
>> Hi Andrew,
>>
>> The example you provide can only partially be performed using a rule
>> based
>> stemmer, such as those uesd by Snowball. Most stemmers are capable of
>> stemming eating, eats, and eaten to eat. However they will not stem ate
>> to
>> eat.
>>
>> While in theory you could consturuct some form of dictionary to help
>> with
>> these verbal irregularities, it would be an very complex task.
>>
>
> OK... Hmmm. So then I should assume that, for more complete stemming,
> there are no ready-made, easy-to-use dictionaries available under free
> licenses? I guess I assumed that there would be, given the prevalence of
> free software spelling checkers. Can't the data used by MySpell or the
> likes be adapted? Or is it a very different sort of dictionary that
> would be needed?
>
> Thanks,
> Andrew
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to