Hi, If you name your models after the default you can also use the default configurations of the engines.
On Mon, Sep 22, 2014 at 12:35 PM, Mohammad Ghufran <emghuf...@gmail.com> wrote: > (OpenNlpTokenizerEngine | name=custom-opennlp-token-fr)! and > I have configured an instance of Open Nlp Tokenizer with the following > settings (which are the only things I can configure): > name: custom-opennlp-token-fr > language configuration: fr;model={fr-token.bin} you need to configure fr;model=fr-token.bin without the brackets But as "fr-token.bin" is anyway the default just remove the line altogether. > > Am I doing it wrong? Also, any idea about the Open Calais engines? > If you use the full launcher ... do the following * open http://localhost:8080/system/console/configMgr * search for "OpenCalais" and click the edit button on the right side * add your license information * click OK and you should have an opencalais engine available best Rupert > Thanks again! > Ghufran > > > On Mon, Sep 22, 2014 at 11:21 AM, Rupert Westenthaler < > rupert.westentha...@gmail.com> wrote: > >> Hi Ghufran, >> >> >> On Mon, Sep 22, 2014 at 10:42 AM, Mohammad Ghufran <emghuf...@gmail.com> >> wrote: >> > Hello, >> > >> > I am interested in using Stanbol as part of my Research project but I am >> > having trouble handling languages other than English. I realize that this >> > list is for development and my questions may not be 100% relevant to >> > development, but this is the best place I could find to ask for help. I'd >> > appreciate if someone can guide me a little given that documentation is >> > quite sparse! >> > >> > I am primarily interested in doing named entity recognition in multiple >> > languages (French, and English mostly). For this, I found a model for >> > french built by someone here: >> > >> http://enicolashernandez.blogspot.fr/2012/12/apache-opennlp-fr-models.html >> > . Models for all the tasks including segmentation, tokenization, POS, and >> > NER for French can be found here. What I am unable to achieve is to >> > successfully use these models. From what I gather, all the external >> models >> > should be put inside the {install-directory}/stanbol/datafiles directory. >> >> Thats correct. If you copy the models in this directory they can be >> found by Stanbol. >> >> However the OpenNLP modules do use specific name patterns for model >> files. So make sure that your custom models do follow such name >> schemes: >> >> * Sentence: {lang}-sent.bin (e.g. "fr-sent-bin") >> * Token: {lang}-token.bin (e.g. "fr-token.bin") >> * Pos: {lang}-pos-perceptron.bin or {lang}-pos-maxent.bin depending on >> if you use a perceptron or maxent model (e.g."fr-pos-maxent.bin") >> * Chunker: {lang}-chunker.bin (e.g. "fr-chunker.bin") >> * Namefinder: {lang}-ner-{type}.bin. The default types are >> * person (e.g. "fr-ner-person.bin") >> * location (e.g. "fr-ner-location.bin") >> * organization (e.g. "fr-ner-organization.bin") >> * for other types see >> >> http://stanbol.apache.org/docs/trunk/components/enhancer/engines/opennlpcustomner >> >> You can use models with other names, but in this case you will need to >> add explicit configurations with the used names to the engines using >> those. If you want to opt for this please note the documentation of >> the engines. >> >> * Sentence Detection: >> >> http://stanbol.apache.org/docs/trunk/components/enhancer/engines/opennlpsentence >> * Tokenization: >> >> http://stanbol.apache.org/docs/trunk/components/enhancer/engines/opennlptokenizer >> * Pos Tagging: >> http://stanbol.apache.org/docs/trunk/components/enhancer/engines/opennlppos >> * Chunking: >> http://stanbol.apache.org/docs/trunk/components/enhancer/engines/opennlpchunker >> >> all those engines do allow to configure processed languages. Via the >> `model` parameter of a language you can set the name of the model file >> (located in the `stanbol/datafile/` folder) >> >> Hope this solves you issue >> best >> Rupert >> >> > However, when I create a chain with the new components, I get an error >> that >> > one of the models was not found (this seems to be arbitrary since all the >> > models are in the same location but the error doesn't occur for all the >> > models. For example, sentence segmentation with the french model seems to >> > work fine but tokenization fails). Could someone please help me with how >> to >> > set up models other languages? Inside the opennlp directory, there are >> > folders for 'lang' and 'ner', what are these for precisely? >> > >> > Secondly, I also wanted to investigate using OpenCalais enhancement >> engine. >> > There is limited documentation about this which says that an API key must >> > be obtained. However, I don't see any enhancement engine corresponding to >> > OpenCalais in the OSGi console. Could someone please suggest how I could >> > proceed with configuring this engine? >> > >> > I have compiled Apache Stanbol from source. >> > >> > Best Regards and thanks in advance! >> > Ghufran >> >> >> >> -- >> | Rupert Westenthaler rupert.westentha...@gmail.com >> | Bodenlehenstraße 11 ++43-699-11108907 >> | A-5500 Bischofshofen >> | REDLINK.CO >> .......................................................................... >> | http://redlink.co/ >> -- | Rupert Westenthaler rupert.westentha...@gmail.com | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen | REDLINK.CO .......................................................................... | http://redlink.co/