Hi all,
following Burn's proposal for multimodal analysis component skeleton I also
have a couple of components to propose for inclusion inside the sandbox:
- Solr CAS Consumer - to consume CAS/types/features inside Solr fields.
This could be put inside Lucas or in a separate project
- a Simple Language Annotator - to extract language from document text,
this one can use 3 algorithms:
- Tika 0.8 language identification capability
- Alchemy language annotator
- Dictionaries of stopwords for each language
What do you think?
Regards,
Tommaso