Hi Stefan

On Sat, May 17, 2014 at 3:49 PM, Stefan Bunk
<stefan.b...@student.hpi.uni-potsdam.de> wrote:
> Problem is, that my texts send to the chain are quite short, only one
> sentence usually and they often contain some obviously non-english name
> like "Costa de Xurius". This confuses the language detection, which does
> not output english anymore but rather spanish in this example. Afterwards,
> the geonames-ner engine does not even bother to run because the text is not
> in a language it was trained for.
>
> So, what's the right way to do it now? Can I somehow force the chain to
> emit english as the language of the text? Removing the langdetect engine
> does not work, as it is needed by the custom ner model engine.
>

This remembers me on STANBOL-660 that is about exactly this problem.
Was not affected by it for some time so I totally forgot about it.
I scheduled this issue to be fixed with 0.12.1 and 1.0.0. Will try to
implement this later today.

When this is implemented you can parse the language via the
Content-Language header and remove the LanguageDetection engine from
your chain.

> ----
> Furthermore, I am not satisfied with the geonames.org entity linking.
> Even when the text is correctly classified as english and the location
> entity is found, the geonames linking can't link many entities.
> Example:
> The text snippet is "University of Buenos Aires". This is the exact name of
> the entity on geonames.org. Still, I had to lower the confidence score to
> 20% to have the geonames engine find the link (confidence: 24%). Many
> entities are not even found, even when I use the exact name as on
> geonames.org and it is correctly identified as a location.
>
> Where can I look into to increase the linking performance?
>

I think STANBOL-1303 is the reason for the unexpected confidence values.

You can try using the Entityhub Indexing Tool for Geonames
(entityhub/indexing/geonames) to generate your own local index for
Geonames. After installing this index to the Stanbol Entityhub you can
used the Named Entity Linking Engine [1] for entity linking. This
would also have the advantage that you do not depend on an external
service for linking.

You can use one of the genomes indexes available at [2] for testing.
Those are based on a geonames.org dump that is about 1 year old.

best
Rupert



[1] 
http://stanbol.apache.org/docs/trunk/components/enhancer/engines/namedentitytaggingengine
[2] http://dev.iks-project.eu/downloads/stanbol-indices/geonames/

-- 
| Rupert Westenthaler             rupert.westentha...@gmail.com
| Bodenlehenstraße 11                              ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO 
..........................................................................
| http://redlink.co/

Reply via email to