Re: confidence of selected-text to entity-reference match

Rupert Westenthaler Mon, 09 Dec 2013 04:04:09 -0800

Hi Claude,

All EntityLinking engines calculate the confidence on how well a Label
of an Entity does match the mention in the text. While they do have
parameters that do allow to configure how fuzzy matching can be they
do often not support to explicitly set a minimum confidence value (or
at least not directly via a single parameter). The EntityLinking
engine uses a slightly modified version of the algorithm as described
for the KeywordLinkingEngine [1].


If you want to filter all fise:EntityAnnotation instances with a
confidence < value the best thing is to create your own engine that
does exactly this

* canEnhance can simple return ENHANCE_SYNC
* processEnhancement needs to:
    1. get the UriReds of TextAnnotations (ci.getMetadata.filter(null,
RDF_TYPE, FISE_ENTITY_ANNOTATION)
    2. get the confidence for all `ta`:
EnhancementEngineHelper.get(ci.getMetadata(),ta, FISE_CONFIDENCE, ...)
    3. if to low remove
        a. all outgoing triples to this enhancement (a)
filter(ta,null,null) -> to get the Iterator; (b) remove all elements
from the Iterator
        b. all incoming triples (a) filter(null,null,ta)  and (b)
again remove all triples from the iterator
* getServiceProperties() should return a map containing the key
ServiceProperties.ENHANCEMENT_ENGINE_ORDERING with a value <
ServiceProperties.ORDERING_POST_PROCESSING (e.g. -200) to ensure that
this engine is executed last in the chain.

The best is to extend AbstractEnhancementEngine. You can use the
LanguageDetectionEnhancementEngine [2] as an example of a simple
engine

BTW: It is really a surprise that such an engine does not yet exist in Stanbol.

best
Rupert

[1] 
http://stanbol.staging.apache.org/docs/trunk/components/enhancer/engines/keywordlinkingengine#confidence-for-suggestions
[2] 
http://svn.apache.org/repos/asf/stanbol/branches/release-0.12/enhancement-engines/langdetect/

On Fri, Dec 6, 2013 at 7:05 PM, Claude Saunders <[email protected]> wrote:
> Hi All,
> My first question to dev (and certainly not last). First off, I'm very much
> enjoying working with stanbol. A very cool and accessible piece of work.
>
> My question is about entity linking. While using the dbpediaLinking engine,
> I see that the confidence of a match is equivalent to the percentage of
> matching chars between the content's selected-text and the
> entity-reference. I can't seem to find a way to tune this so that matches
> below a minimum confidence are rejected (other than applying a SPARQL query
> FILTER later on).
>
> I see a MinimumTokenScore in the config, but it doesn't seem to do
> anything. Does that apply only to ManagedSite entity matching?
>
>    thanks!



-- 
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: confidence of selected-text to entity-reference match

Reply via email to