Rupert Westenthaler created STANBOL-1422:
--------------------------------------------

             Summary: Add support for ixa-nerc NER models
                 Key: STANBOL-1422
                 URL: https://issues.apache.org/jira/browse/STANBOL-1422
             Project: Stanbol
          Issue Type: Bug
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler


The ixa-pipe-nec [1] provides good quality Named Entity Recognition models for 
English, Spanish, Dutch, German and Italian. However to use those models one 
needs

* OpenNLP 1.6.0
* OpenNLP extensions provided by the ixa-pipe-nec module.

OpenNLP 1.6.0 is not yet released so we will need to go with a SNAPSHOT version 
for now. The ixa-pipe-nec module does not support OSGI. So we will need to 
embed the required classes into a bundle and provide a bundle activator that 
registers the extensions as OSGI services (with the metadata expected by 
OpenNLP).

NOTE: This issue will only cover extensions to Apache Stanbol so that one cane 
use the provided models. To use the models Users will need to download the 
~700Mbyte archive linked on [1] get the OpenNLP modles (*.bin files) and put 
them into datafiles folder of Apache Stanbol.

The models use PER, ORG, LOC and MISC as types. So using a configuration for 
the CustomNERModelEnhancementEngine should do the trick:

{code}
# Configuration of 
org.apache.stanbol.enhancer.engines.opennlp.impl.CustomNERModelEnhancementEngine-ixa_nec.config
stanbol.engines.opennlp-ner.typeMappings=["PER\ >\ 
http://dbpedia.org/ontology/Person","ORG\ >\ 
http://dbpedia.org/ontology/Organisation","LOC\ >\ 
http://dbpedia.org/ontology/Place","MISC\ >\ skos:Concept"]
stanbol.enhancer.engine.name="ixa-nerc"
stanbol.engines.opennlp-ner.nameFinderModels=["de-clusters-dictlbj-conll03.bin","en-91-18-4-class-muc7-conll03-ontonotes-4.0.bin","es-clusters-dictlbj-conll02.bin","it-clusters-evalita09.bin","nl-clusters-dictlbj-conll02.bin","eu-clusters-egunkaria.bin"]
{code}

The names of the OpenNLP model files are the values of the 
{{stanbol.engines.opennlp-ner.nameFinderModels}} property. You will find those 
files in the NERC-Models 1.5.0 file. See the documentation on [1] for more 
details and other options.


[1] https://github.com/ixa-ehu/ixa-pipe-nerc/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to