Hello Rupert,

Thank you for the prompt and informative response. I already am following
the convention you mentioned. I have a model by the name fr-token.bin which
seems to be the problem. Stack trace from the error.log below:
​​
22.09.2014 03:26:32.858 *INFO* [qtp1377963382-31]
org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl Error
Message: Enhancement Chain failed because of required Engine
'custom-opennlp-token-fr' failed with Message: Unable to process
ContentItem
'<urn:content-item-sha1-3e41fd4b970d08ee316a95a5b5746e2812acf2c8>' with
Enhancement Engine 'custom-opennlp-token-fr' because the engine was unable
to process the content (Engine class:
org.apache.stanbol.enhancer.engines.opennlp.token.impl.OpenNlpTokenizerEngine)(Reason:
The configured OpenNLP TokenizerModel '{fr-token.bin} is not available'
(OpenNlpTokenizerEngine | name=custom-opennlp-token-fr)!)!
22.09.2014 03:26:32.858 *INFO* [qtp1377963382-31]
org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl
Reported Exception:
org.apache.stanbol.enhancer.servicesapi.EngineException: The configured
OpenNLP TokenizerModel '{fr-token.bin} is not available'
(OpenNlpTokenizerEngine | name=custom-opennlp-token-fr)!
at
org.apache.stanbol.enhancer.engines.opennlp.token.impl.OpenNlpTokenizerEngine.getTokenizer(OpenNlpTokenizerEngine.java:259)
at
org.apache.stanbol.enhancer.engines.opennlp.token.impl.OpenNlpTokenizerEngine.canEnhance(OpenNlpTokenizerEngine.java:154)
at
org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.processEvent(EnhancementJobHandler.java:250)
at
org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.handleEvent(EnhancementJobHandler.java:197)
at
org.apache.felix.eventadmin.impl.handler.EventHandlerProxy.sendEvent(EventHandlerProxy.java:412)
at
org.apache.felix.eventadmin.impl.tasks.SyncDeliverTasks.execute(SyncDeliverTasks.java:118)
at
org.apache.felix.eventadmin.impl.tasks.AsyncDeliverTasks$TaskExecuter.run(AsyncDeliverTasks.java:132)
at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Thread.java:744)

I have configured an instance of Open Nlp Tokenizer with the following
settings (which are the only things I can configure):
name: custom-opennlp-token-fr
language configuration: fr;model={fr-token.bin}

Then, I have configured a custom chain with the following engines:
tika;optional
langdetect
custom-opennlp-sentence-fr (this is setup exactly as the tokenizer above)
custom-opennlp-token-fr
custom-opennlp-pos-fr
custom-opennlp-ner-fr

Am I doing it wrong? Also, any idea about the Open Calais engines?

​Thanks again!
Ghufran


On Mon, Sep 22, 2014 at 11:21 AM, Rupert Westenthaler <
rupert.westentha...@gmail.com> wrote:

> Hi Ghufran,
>
>
> On Mon, Sep 22, 2014 at 10:42 AM, Mohammad Ghufran <emghuf...@gmail.com>
> wrote:
> > Hello,
> >
> > I am interested in using Stanbol as part of my Research project but I am
> > having trouble handling languages other than English. I realize that this
> > list is for development and my questions may not be 100% relevant to
> > development, but this is the best place I could find to ask for help. I'd
> > appreciate if someone can guide me a little given that documentation is
> > quite sparse!
> >
> > I am primarily interested in doing named entity recognition in multiple
> > languages (French, and English mostly). For this, I found a model for
> > french built by someone here:
> >
> http://enicolashernandez.blogspot.fr/2012/12/apache-opennlp-fr-models.html
> > . Models for all the tasks including segmentation, tokenization, POS, and
> > NER for French can be found here. What I am unable to achieve is to
> > successfully use these models. From what I gather, all the external
> models
> > should be put inside the {install-directory}/stanbol/datafiles directory.
>
> Thats correct. If you copy the models in this directory they can be
> found by Stanbol.
>
> However the OpenNLP modules do use specific name patterns for model
> files. So make sure that your custom models do follow such name
> schemes:
>
> * Sentence: {lang}-sent.bin (e.g. "fr-sent-bin")
> * Token: {lang}-token.bin (e.g. "fr-token.bin")
> * Pos: {lang}-pos-perceptron.bin or {lang}-pos-maxent.bin depending on
> if you use a perceptron or maxent model (e.g."fr-pos-maxent.bin")
> * Chunker: {lang}-chunker.bin (e.g. "fr-chunker.bin")
> * Namefinder: {lang}-ner-{type}.bin. The default types are
>     * person (e.g. "fr-ner-person.bin")
>     * location (e.g. "fr-ner-location.bin")
>     * organization (e.g. "fr-ner-organization.bin")
>     * for other types see
>
> http://stanbol.apache.org/docs/trunk/components/enhancer/engines/opennlpcustomner
>
> You can use models with other names, but in this case you will need to
> add explicit configurations with the used names to the engines using
> those. If you want to opt for this please note the documentation of
> the engines.
>
> * Sentence Detection:
>
> http://stanbol.apache.org/docs/trunk/components/enhancer/engines/opennlpsentence
> * Tokenization:
>
> http://stanbol.apache.org/docs/trunk/components/enhancer/engines/opennlptokenizer
> * Pos Tagging:
> http://stanbol.apache.org/docs/trunk/components/enhancer/engines/opennlppos
> * Chunking:
> http://stanbol.apache.org/docs/trunk/components/enhancer/engines/opennlpchunker
>
> all those engines do allow to configure processed languages. Via the
> `model` parameter of a language you can set the name of the model file
> (located in the `stanbol/datafile/` folder)
>
> Hope this solves you issue
> best
> Rupert
>
> > However, when I create a chain with the new components, I get an error
> that
> > one of the models was not found (this seems to be arbitrary since all the
> > models are in the same location but the error doesn't occur for all the
> > models. For example, sentence segmentation with the french model seems to
> > work fine but tokenization fails). Could someone please help me with how
> to
> > set up models other languages? Inside the opennlp directory, there are
> > folders for 'lang' and 'ner', what are these for precisely?
> >
> > Secondly, I also wanted to investigate using OpenCalais enhancement
> engine.
> > There is limited documentation about this which says that an API key must
> > be obtained. However, I don't see any enhancement engine corresponding to
> > OpenCalais in the OSGi console. Could someone please suggest how I could
> > proceed with configuring this engine?
> >
> > I have compiled Apache Stanbol from source.
> >
> > Best Regards and thanks in advance!
> > Ghufran
>
>
>
> --
> | Rupert Westenthaler             rupert.westentha...@gmail.com
> | Bodenlehenstraße 11                              ++43-699-11108907
> | A-5500 Bischofshofen
> | REDLINK.CO
> ..........................................................................
> | http://redlink.co/
>

Reply via email to