Re: couple of Sandbox components

Thilo Götz Mon, 29 Nov 2010 00:23:41 -0800

On 11/26/2010 09:35, Tommaso Teofili wrote:
> Hi all,
> following Burn's proposal for multimodal analysis component skeleton I also
> have a couple of components to propose for inclusion inside the sandbox:
> 
>    - Solr CAS Consumer - to consume CAS/types/features inside Solr fields.
>    This could be put inside Lucas or in a separate project
>    - a Simple Language Annotator - to extract language from document text,
>    this one can use 3 algorithms:
>       - Tika 0.8 language identification capability
>       - Alchemy language annotator
>       - Dictionaries of stopwords for each language


Hi Tommaso,

do you know what algorithm Tika uses for language identification?
I'm wondering how well it does.  I'm very much in favor of having
an out-of-the-box language ID annotator for UIMA.

--Thilo

> 
> What do you think?
> Regards,
> Tommaso
>

Re: couple of Sandbox components

Reply via email to