[Gardeners] library for tokenization of natural languages ?

Jean-Christophe Helary Mon, 05 Feb 2007 00:12:01 -0800

I am looking for a library that would do basic to reasonably smart  
tokenization of natural language strings.


Like, if fed something in English or French, it creates tokens for  
the things between the spaces, for Japanese, it deals with the non- 
spaced strings in a rule based fashion.

I think Lucene can do that and so montezuma would be a candidate (?),  
but I wonder if any of you has experience with such tools, especially  
for languages that do not use spaces.

Jean-Christophe Helary




_______________________________________________
Gardeners mailing list
[email protected]
http://www.lispniks.com/mailman/listinfo/gardeners

[Gardeners] library for tokenization of natural languages ?

Reply via email to