check out langutils and remind me to do another release soon!
On Feb 5, 2007, at 3:11 AM, Jean-Christophe Helary wrote: > I am looking for a library that would do basic to reasonably smart > tokenization of natural language strings. > > Like, if fed something in English or French, it creates tokens for > the things between the spaces, for Japanese, it deals with the non- > spaced strings in a rule based fashion. > > I think Lucene can do that and so montezuma would be a candidate (?), > but I wonder if any of you has experience with such tools, especially > for languages that do not use spaces. > > Jean-Christophe Helary > > > > > _______________________________________________ > Gardeners mailing list > [email protected] > http://www.lispniks.com/mailman/listinfo/gardeners _______________________________________________ Gardeners mailing list [email protected] http://www.lispniks.com/mailman/listinfo/gardeners
