> * A Nutch implementation: > http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/src/plugin/languageidentifier/ > > * A Lucene patch: http://issues.apache.org/bugzilla/show_bug.cgi?id=26763
A step in the right direction. It doesn't have other language categories created though. > * JTextCat (http://www.jedi.be/JTextCat/index.html), a Java wrapper > for libtextcat Yes. I saw JTextCat.. I didn't want any JNI used. > * NGramJ (http://ngramj.sourceforge.net/), a general n-gram Java library LGPL.. yuk. That said I think I reviewed this package and found it lacking. I started off just trying to find a library to use in our crawler but never found anything. Which is why I ended up writing my own. > Of these, the Nutch one is certainly under active development, the > others don't seem to be as far as I can tell. They should just use ngramcat :) Kevin -- Kevin A. Burton, Location - San Francisco, CA AIM/YIM - sfburtonator, Web - http://www.feedblog.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]