Hello Rupert, Thank you for the prompt and informative response. I already am following the convention you mentioned. I have a model by the name fr-token.bin which seems to be the problem. Stack trace from the error.log below: 22.09.2014 03:26:32.858 *INFO* [qtp1377963382-31] org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl Error Message: Enhancement Chain failed because of required Engine 'custom-opennlp-token-fr' failed with Message: Unable to process ContentItem '<urn:content-item-sha1-3e41fd4b970d08ee316a95a5b5746e2812acf2c8>' with Enhancement Engine 'custom-opennlp-token-fr' because the engine was unable to process the content (Engine class: org.apache.stanbol.enhancer.engines.opennlp.token.impl.OpenNlpTokenizerEngine)(Reason: The configured OpenNLP TokenizerModel '{fr-token.bin} is not available' (OpenNlpTokenizerEngine | name=custom-opennlp-token-fr)!)! 22.09.2014 03:26:32.858 *INFO* [qtp1377963382-31] org.apache.stanbol.enhancer.jobmanager.event.impl.EventJobManagerImpl Reported Exception: org.apache.stanbol.enhancer.servicesapi.EngineException: The configured OpenNLP TokenizerModel '{fr-token.bin} is not available' (OpenNlpTokenizerEngine | name=custom-opennlp-token-fr)! at org.apache.stanbol.enhancer.engines.opennlp.token.impl.OpenNlpTokenizerEngine.getTokenizer(OpenNlpTokenizerEngine.java:259) at org.apache.stanbol.enhancer.engines.opennlp.token.impl.OpenNlpTokenizerEngine.canEnhance(OpenNlpTokenizerEngine.java:154) at org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.processEvent(EnhancementJobHandler.java:250) at org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.handleEvent(EnhancementJobHandler.java:197) at org.apache.felix.eventadmin.impl.handler.EventHandlerProxy.sendEvent(EventHandlerProxy.java:412) at org.apache.felix.eventadmin.impl.tasks.SyncDeliverTasks.execute(SyncDeliverTasks.java:118) at org.apache.felix.eventadmin.impl.tasks.AsyncDeliverTasks$TaskExecuter.run(AsyncDeliverTasks.java:132) at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Thread.java:744)
I have configured an instance of Open Nlp Tokenizer with the following settings (which are the only things I can configure): name: custom-opennlp-token-fr language configuration: fr;model={fr-token.bin} Then, I have configured a custom chain with the following engines: tika;optional langdetect custom-opennlp-sentence-fr (this is setup exactly as the tokenizer above) custom-opennlp-token-fr custom-opennlp-pos-fr custom-opennlp-ner-fr Am I doing it wrong? Also, any idea about the Open Calais engines? Thanks again! Ghufran On Mon, Sep 22, 2014 at 11:21 AM, Rupert Westenthaler < rupert.westentha...@gmail.com> wrote: > Hi Ghufran, > > > On Mon, Sep 22, 2014 at 10:42 AM, Mohammad Ghufran <emghuf...@gmail.com> > wrote: > > Hello, > > > > I am interested in using Stanbol as part of my Research project but I am > > having trouble handling languages other than English. I realize that this > > list is for development and my questions may not be 100% relevant to > > development, but this is the best place I could find to ask for help. I'd > > appreciate if someone can guide me a little given that documentation is > > quite sparse! > > > > I am primarily interested in doing named entity recognition in multiple > > languages (French, and English mostly). For this, I found a model for > > french built by someone here: > > > http://enicolashernandez.blogspot.fr/2012/12/apache-opennlp-fr-models.html > > . Models for all the tasks including segmentation, tokenization, POS, and > > NER for French can be found here. What I am unable to achieve is to > > successfully use these models. From what I gather, all the external > models > > should be put inside the {install-directory}/stanbol/datafiles directory. > > Thats correct. If you copy the models in this directory they can be > found by Stanbol. > > However the OpenNLP modules do use specific name patterns for model > files. So make sure that your custom models do follow such name > schemes: > > * Sentence: {lang}-sent.bin (e.g. "fr-sent-bin") > * Token: {lang}-token.bin (e.g. "fr-token.bin") > * Pos: {lang}-pos-perceptron.bin or {lang}-pos-maxent.bin depending on > if you use a perceptron or maxent model (e.g."fr-pos-maxent.bin") > * Chunker: {lang}-chunker.bin (e.g. "fr-chunker.bin") > * Namefinder: {lang}-ner-{type}.bin. The default types are > * person (e.g. "fr-ner-person.bin") > * location (e.g. "fr-ner-location.bin") > * organization (e.g. "fr-ner-organization.bin") > * for other types see > > http://stanbol.apache.org/docs/trunk/components/enhancer/engines/opennlpcustomner > > You can use models with other names, but in this case you will need to > add explicit configurations with the used names to the engines using > those. If you want to opt for this please note the documentation of > the engines. > > * Sentence Detection: > > http://stanbol.apache.org/docs/trunk/components/enhancer/engines/opennlpsentence > * Tokenization: > > http://stanbol.apache.org/docs/trunk/components/enhancer/engines/opennlptokenizer > * Pos Tagging: > http://stanbol.apache.org/docs/trunk/components/enhancer/engines/opennlppos > * Chunking: > http://stanbol.apache.org/docs/trunk/components/enhancer/engines/opennlpchunker > > all those engines do allow to configure processed languages. Via the > `model` parameter of a language you can set the name of the model file > (located in the `stanbol/datafile/` folder) > > Hope this solves you issue > best > Rupert > > > However, when I create a chain with the new components, I get an error > that > > one of the models was not found (this seems to be arbitrary since all the > > models are in the same location but the error doesn't occur for all the > > models. For example, sentence segmentation with the french model seems to > > work fine but tokenization fails). Could someone please help me with how > to > > set up models other languages? Inside the opennlp directory, there are > > folders for 'lang' and 'ner', what are these for precisely? > > > > Secondly, I also wanted to investigate using OpenCalais enhancement > engine. > > There is limited documentation about this which says that an API key must > > be obtained. However, I don't see any enhancement engine corresponding to > > OpenCalais in the OSGi console. Could someone please suggest how I could > > proceed with configuring this engine? > > > > I have compiled Apache Stanbol from source. > > > > Best Regards and thanks in advance! > > Ghufran > > > > -- > | Rupert Westenthaler rupert.westentha...@gmail.com > | Bodenlehenstraße 11 ++43-699-11108907 > | A-5500 Bischofshofen > | REDLINK.CO > .......................................................................... > | http://redlink.co/ >