Hi Claude, All EntityLinking engines calculate the confidence on how well a Label of an Entity does match the mention in the text. While they do have parameters that do allow to configure how fuzzy matching can be they do often not support to explicitly set a minimum confidence value (or at least not directly via a single parameter). The EntityLinking engine uses a slightly modified version of the algorithm as described for the KeywordLinkingEngine [1].
If you want to filter all fise:EntityAnnotation instances with a confidence < value the best thing is to create your own engine that does exactly this * canEnhance can simple return ENHANCE_SYNC * processEnhancement needs to: 1. get the UriReds of TextAnnotations (ci.getMetadata.filter(null, RDF_TYPE, FISE_ENTITY_ANNOTATION) 2. get the confidence for all `ta`: EnhancementEngineHelper.get(ci.getMetadata(),ta, FISE_CONFIDENCE, ...) 3. if to low remove a. all outgoing triples to this enhancement (a) filter(ta,null,null) -> to get the Iterator; (b) remove all elements from the Iterator b. all incoming triples (a) filter(null,null,ta) and (b) again remove all triples from the iterator * getServiceProperties() should return a map containing the key ServiceProperties.ENHANCEMENT_ENGINE_ORDERING with a value < ServiceProperties.ORDERING_POST_PROCESSING (e.g. -200) to ensure that this engine is executed last in the chain. The best is to extend AbstractEnhancementEngine. You can use the LanguageDetectionEnhancementEngine [2] as an example of a simple engine BTW: It is really a surprise that such an engine does not yet exist in Stanbol. best Rupert [1] http://stanbol.staging.apache.org/docs/trunk/components/enhancer/engines/keywordlinkingengine#confidence-for-suggestions [2] http://svn.apache.org/repos/asf/stanbol/branches/release-0.12/enhancement-engines/langdetect/ On Fri, Dec 6, 2013 at 7:05 PM, Claude Saunders <csaund...@thetus.com> wrote: > Hi All, > My first question to dev (and certainly not last). First off, I'm very much > enjoying working with stanbol. A very cool and accessible piece of work. > > My question is about entity linking. While using the dbpediaLinking engine, > I see that the confidence of a match is equivalent to the percentage of > matching chars between the content's selected-text and the > entity-reference. I can't seem to find a way to tune this so that matches > below a minimum confidence are rejected (other than applying a SPARQL query > FILTER later on). > > I see a MinimumTokenScore in the config, but it doesn't seem to do > anything. Does that apply only to ManagedSite entity matching? > > thanks! -- | Rupert Westenthaler rupert.westentha...@gmail.com | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen