El jue, 26-04-2007 a las 09:29 +1000, Daniel Noll escribió:
> [EMAIL PROTECTED] wrote:
> > I guess there are a few points
> > 
> > - it is impossible to stem with total accuracy using rules alone
> > 
> > - combining a rule based stemmer with a dictionary could also be error
> > prone. Unrelated words can have the same stem - consider the past tense of
> > see and the stem of sawing ( cutting wood )
> 
> I guess you could solve some of these by stemming (actually, lemmatising 
> is probably more accurate, since you'll almost certainly need part of 
> speech information to do it with any accuracy) to the wordnet number, 
> for which the meaning is unique.
> 
> But there are still ambiguous sentences where you don't know which word 
> it is.  To use the example just given, "I saw wood" is pretty ambiguous, 
> without having more context.
> 
> Daniel
> 
> 

Great... thanks, everyone, for your replies. For now we've decided to
stick to the standard Snowball rule-based stemmer, but we'll be keeping
our eyes open for other options that may come up.

- Andrew Green


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to