Congrats Benson! Basis primarily uses a JNI wrapper to integrate with Lucene? I'm indexing using Hadoop and it'd be great if it were all in Java... So yeah, "We shall see". :)
Jason On Wed, Jan 13, 2010 at 7:33 PM, Benson Margulies <[email protected]> wrote: > I'm a somewhat grizzled software guy. My background is mostly making > sense of big, messy, piles of code. (If confusing, I clarify; if clear > ...) > > I've spent a lot of time on internationalization and performance > tuning. Over the last year I've had a sort of crash course in NLP. > Basis Technology, where I work, has always had a certain amount of NLP > going on, but it's become a more and more important part of what we > do. In spite of my status as a very, very, rusty mathematician I do my > best to keep up. > > If there's one NLP thing I know something about, now, it is named > entity extraction with averaged perceptrons and passive-aggressive > training. This has the advantage of being mathematically trivial > unless you want to prove that it works, which is as about as useful as > proving that bumblebees can (or can't) fly. > > At Apache my center of gravity is probably CXF (web services), which I > wandered into while contributing code to automatically generate > Javascript clients for web services. > > Ironically, Basis owns a lot of code which is/was built by people who > believe just the opposite of the Mahout motto -- that cloud > distribution can overcome the inherent performance disadvantage of > Java, leaving you with all the other advantages. > > We shall see. >
