Ah, well, a longer story. We sell segmenters lemmatizers that plug into Lucene. Until recently, JNI all the way down. We've delivered a new version to a customer that does some European languages entirely in Java, and we expect to be able to do this for many more languages this year.
On Thu, Jan 14, 2010 at 12:54 PM, Jason Rutherglen <[email protected]> wrote: > Congrats Benson! > > Basis primarily uses a JNI wrapper to integrate with Lucene? I'm > indexing using Hadoop and it'd be great if it were all in Java... So > yeah, "We shall see". :) > > Jason > > On Wed, Jan 13, 2010 at 7:33 PM, Benson Margulies <[email protected]> > wrote: >> I'm a somewhat grizzled software guy. My background is mostly making >> sense of big, messy, piles of code. (If confusing, I clarify; if clear >> ...) >> >> I've spent a lot of time on internationalization and performance >> tuning. Over the last year I've had a sort of crash course in NLP. >> Basis Technology, where I work, has always had a certain amount of NLP >> going on, but it's become a more and more important part of what we >> do. In spite of my status as a very, very, rusty mathematician I do my >> best to keep up. >> >> If there's one NLP thing I know something about, now, it is named >> entity extraction with averaged perceptrons and passive-aggressive >> training. This has the advantage of being mathematically trivial >> unless you want to prove that it works, which is as about as useful as >> proving that bumblebees can (or can't) fly. >> >> At Apache my center of gravity is probably CXF (web services), which I >> wandered into while contributing code to automatically generate >> Javascript clients for web services. >> >> Ironically, Basis owns a lot of code which is/was built by people who >> believe just the opposite of the Mahout motto -- that cloud >> distribution can overcome the inherent performance disadvantage of >> Java, leaving you with all the other advantages. >> >> We shall see. >> >
