Using Nutch LanguageIdentifierPlugin in Apache UIMA

Michael Baessler Wed, 15 Aug 2007 05:00:59 -0700

Hi,

I'm one of the Apache UIMA committers and while searching for an opensource language detection technology I found the

Nutch LanguageIdentifierPlugin.


First a short introduction what UIMA is:

UIMA stands for Unstructured Information Management Architecture and isa component architecture and software framework implementationfor the analysis of unstructured content like text, video and audiodata. The framework has a pluggable architecture to build a chain ofanalysis engines to analyze the content. For further and more detailedinformation about UIMA, please refer to the Apache UIMA homepage:

http://incubator.apache.org/uima/

We are interested in such a language identifier technology to wrap it asUIMA analysis engine, so that it can be used to build an analysis chainto analyze text content.We created an UIMA sandbox to host such analysis engines that everybodycan use these engines he is interested in to build an analysis chain forhis needs.


Now my questions:

Is there a place where I can find some more details about how yourlanguage identification works?Will it be possible to share the language identification technology sothat we can wrap it as UIMA analysis engine? My current understandingis, that it is only available within Nutch but not separately.

Since both projects are hosted on Apache, I don't see any license issueswhen using your technology. :-)


Thanks for your answers in advance!

-- Michael

Using Nutch LanguageIdentifierPlugin in Apache UIMA

Reply via email to