Hello, I am interested in using Stanbol as part of my Research project but I am having trouble handling languages other than English. I realize that this list is for development and my questions may not be 100% relevant to development, but this is the best place I could find to ask for help. I'd appreciate if someone can guide me a little given that documentation is quite sparse!
I am primarily interested in doing named entity recognition in multiple languages (French, and English mostly). For this, I found a model for french built by someone here: http://enicolashernandez.blogspot.fr/2012/12/apache-opennlp-fr-models.html . Models for all the tasks including segmentation, tokenization, POS, and NER for French can be found here. What I am unable to achieve is to successfully use these models. From what I gather, all the external models should be put inside the {install-directory}/stanbol/datafiles directory. However, when I create a chain with the new components, I get an error that one of the models was not found (this seems to be arbitrary since all the models are in the same location but the error doesn't occur for all the models. For example, sentence segmentation with the french model seems to work fine but tokenization fails). Could someone please help me with how to set up models other languages? Inside the opennlp directory, there are folders for 'lang' and 'ner', what are these for precisely? Secondly, I also wanted to investigate using OpenCalais enhancement engine. There is limited documentation about this which says that an API key must be obtained. However, I don't see any enhancement engine corresponding to OpenCalais in the OSGi console. Could someone please suggest how I could proceed with configuring this engine? I have compiled Apache Stanbol from source. Best Regards and thanks in advance! Ghufran