Hi Stefan
On Wed, May 14, 2014 at 4:08 PM, Stefan Bunk <stefan.b...@student.hpi.uni-potsdam.de> wrote: > Hi, > > I have problems with using the Custom NER Model Extraction Engine [1]. > Basically, no entities are not found, even though the underlying model is > correct. > > Here's what I did: > 1. I build a custom NER model for places from geonames.org according to > the OpenNLP website [2]. I tested my model with the OpenNLP command line > tool, and it worked (i.e. I give my model a text and the entities are found > correctly). > 2. I copied the model to both ./launchers/stanbol/datafiles/geonames.bin > and ./enhancement-engines/topic/engine/sling/datafiles/geonames.bin. You need to copy the model to the datafilee folder of your stanbol instance. By default this is "./stanbol/datafiles". So if you run stanbol in "/foo/bar" the model needs to be available under "/foo/bar/stanbol/datafiles/geonames.bin". > 3. In the Apache Felix Web Console Configuration, I created a new "Custom > NER Model" with the following settings: > - name: Geonames NER This is the name of the engine. Typically lower case names with '-' as word separator or CamelCase names are used as names. So I suggest to use "geonames-ner" as name for your engine > - Name Finder Model: geonames.bin > - Type Mappings: place > http://dbpedia.org/ontology/Place > - Ranking: -100 > 4. I build a new enhancement chain with: tika, langdetect, > opennlp-sentence, opennlp-token, opennlp-pos, opennlp-ner, geonames-ner, > geonames Based on the provided information you used "Geonames NER" as name of your engine. This chain however refers "geonames-ner". I would expect the chain to be unsatisfied as no "geonames-ner" engine is around. > 5. Server restart A server restart is not needed. If you update the model you might need to start/stop the OpenNLP component as it keeps a SoftReference to the loaded models. > 6. I send the exactly same string as in 1. when I tested the model, but no > entities are found. I would expect an ChainException as your chain refers "geonames-ner" and the name of the configured engine is "Geonames NER" > > Any hint would be useful! > How can I check, that Stanbol correctly finds my geonames.bin file? If I > intentionally add a file which does not exist, no error occurs. The "Stanbol Data File Provider" Tab of the Felix Webconsole provides information about requested data files. There is also INFO level logging of the Custom NER Model Engine. As I was not using the Custom NER engine since a long time I successfully tested the engine with the 0.12.1-SNAPSHOT [4] * by using [3] - the default english place model * renaming it to genomes-ner.bin * copying it to the ./stanbol/datafiles folder of my test instance * configuring a Custom NER engine with stanbol.engines.opennlp-ner.typeMappings=["location\ >\ http://dbpedia.org/ontology/Place"] stanbol.enhancer.engine.name="geonames-ner" stanbol.engines.opennlp-ner.nameFinderModels=["geonames-ner.bin"] * configuring a Weighted Chain with stanbol.enhancer.chain.weighted.chain=["langdetect","opennlp-sentence","opennlp-token","geonames-ner"] stanbol.enhancer.chain.name="geonames-ner" This setting provided the expected results - meaning the exact same list of locations as when using the "opennlp-ner" engine As you do not get an ChainException the most likely reason four your problem is that the "geonames.bin" model is no in the correct folder. As soon as the model is available you should see a message like 15.05.2014 10:38:28.739 *INFO* [DataFileTrackingDaemon] org.apache.stanbol.enhancer.engines.opennlp.impl.CustomNERModelEnhancementEngine register custom NameFinderModel from resource: geonames-ner.bin for language: en to NamedModelFileListener (name:opennlp-ner) in the logs. hope this helps best Rupert > > Thanks in advance > Stefan > > > > > [1] > https://stanbol.apache.org/docs/trunk/components/enhancer/engines/opennlpcustomner > [2] > http://sourceforge.net/apps/mediawiki/opennlp/index.php?title=Name_Finder [3] http://dev.iks-project.eu/downloads/opennlp/models-1.5/en-ner-location.bin [4] http://svn.apache.org/repos/asf/stanbol/branches/release-0.12/ -- | Rupert Westenthaler rupert.westentha...@gmail.com | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen | REDLINK.CO .......................................................................... | http://redlink.co/