Hi Rupert, hi all, thanks to your hints I was able to track down to the problem. First, I checked the engine name and the file location and both were correct (yes, I did not write the correct name I used in the original post, I am sorry for that). The file was found correctly. Still, it wasn't working.
What got me on the right track was: > 15.05.2014 10:38:28.739 *INFO* [DataFileTrackingDaemon] org.apache.stanbol.enhancer.engines.opennlp.impl.CustomNERModelEnhancementEngine > register custom NameFinderModel from resource: geonames-ner.bin for > language: en to NamedModelFileListener (name:opennlp-ner) > > in the logs. > and the fact, that the geonames-ner always only ran only for 1ms (which is really fast, given the 5 megabyte model it has to work through). Problem is, that my texts send to the chain are quite short, only one sentence usually and they often contain some obviously non-english name like "Costa de Xurius". This confuses the language detection, which does not output english anymore but rather spanish in this example. Afterwards, the geonames-ner engine does not even bother to run because the text is not in a language it was trained for. So, what's the right way to do it now? Can I somehow force the chain to emit english as the language of the text? Removing the langdetect engine does not work, as it is needed by the custom ner model engine. ---- Furthermore, I am not satisfied with the geonames.org entity linking. Even when the text is correctly classified as english and the location entity is found, the geonames linking can't link many entities. Example: The text snippet is "University of Buenos Aires". This is the exact name of the entity on geonames.org. Still, I had to lower the confidence score to 20% to have the geonames engine find the link (confidence: 24%). Many entities are not even found, even when I use the exact name as on geonames.org and it is correctly identified as a location. Where can I look into to increase the linking performance? Best, Stefan