El jue, 26-04-2007 a las 09:29 +1000, Daniel Noll escribió: > [EMAIL PROTECTED] wrote: > > I guess there are a few points > > > > - it is impossible to stem with total accuracy using rules alone > > > > - combining a rule based stemmer with a dictionary could also be error > > prone. Unrelated words can have the same stem - consider the past tense of > > see and the stem of sawing ( cutting wood ) > > I guess you could solve some of these by stemming (actually, lemmatising > is probably more accurate, since you'll almost certainly need part of > speech information to do it with any accuracy) to the wordnet number, > for which the meaning is unique. > > But there are still ambiguous sentences where you don't know which word > it is. To use the example just given, "I saw wood" is pretty ambiguous, > without having more context. > > Daniel > >
Great... thanks, everyone, for your replies. For now we've decided to stick to the standard Snowball rule-based stemmer, but we'll be keeping our eyes open for other options that may come up. - Andrew Green --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]