On 11/26/2010 09:35, Tommaso Teofili wrote:
> Hi all,
> following Burn's proposal for multimodal analysis component skeleton I also
> have a couple of components to propose for inclusion inside the sandbox:
> 
>    - Solr CAS Consumer - to consume CAS/types/features inside Solr fields.
>    This could be put inside Lucas or in a separate project
>    - a Simple Language Annotator - to extract language from document text,
>    this one can use 3 algorithms:
>       - Tika 0.8 language identification capability
>       - Alchemy language annotator
>       - Dictionaries of stopwords for each language

Hi Tommaso,

do you know what algorithm Tika uses for language identification?
I'm wondering how well it does.  I'm very much in favor of having
an out-of-the-box language ID annotator for UIMA.

--Thilo

> 
> What do you think?
> Regards,
> Tommaso
> 

Reply via email to