Hello,

I am interested in using Stanbol as part of my Research project but I am
having trouble handling languages other than English. I realize that this
list is for development and my questions may not be 100% relevant to
development, but this is the best place I could find to ask for help. I'd
appreciate if someone can guide me a little given that documentation is
quite sparse!

I am primarily interested in doing named entity recognition in multiple
languages (French, and English mostly). For this, I found a model for
french built by someone here:
http://enicolashernandez.blogspot.fr/2012/12/apache-opennlp-fr-models.html
. Models for all the tasks including segmentation, tokenization, POS, and
NER for French can be found here. What I am unable to achieve is to
successfully use these models. From what I gather, all the external models
should be put inside the {install-directory}/stanbol/datafiles directory.
However, when I create a chain with the new components, I get an error that
one of the models was not found (this seems to be arbitrary since all the
models are in the same location but the error doesn't occur for all the
models. For example, sentence segmentation with the french model seems to
work fine but tokenization fails). Could someone please help me with how to
set up models other languages? Inside the opennlp directory, there are
folders for 'lang' and 'ner', what are these for precisely?

Secondly, I also wanted to investigate using OpenCalais enhancement engine.
There is limited documentation about this which says that an API key must
be obtained. However, I don't see any enhancement engine corresponding to
OpenCalais in the OSGi console. Could someone please suggest how I could
proceed with configuring this engine?

I have compiled Apache Stanbol from source.

Best Regards and thanks in advance!
Ghufran

Reply via email to