Hi Danny, In short: IMO using a sigmoid function is not a good way to get [0..1] bounded scores for the Geonames LocationEnhancementEngine. Using 1st levenshtein and 2nd an ordering based on the genomes service scores results in confidence values consistent with those of other linking engines.
See comments below for details: On Wed, May 7, 2014 at 8:57 AM, Danny Ayers <danny.ay...@gmail.com> wrote: > [oops, old list address] > > noticed this via Jenkins: > [[ > The Geonames.org service changed the value range of provided scores from > [0..100] to [0..inv]. Because of that the engine does no longer report > fise:confidence values in the range of [0..1]. > ]] > https://issues.apache.org/jira/browse/STANBOL-1303 > > two possible normalization strategies are listed alongside the issue, I'd > like to suggest another - I used it a while back on some messy numerics, is > simple & robust: > > https://en.wikipedia.org/wiki/Sigmoid_function > > essentially > > out = 1/(1+exp(-in)) > > for > inf. < in < inf. > gives > -1 < out < 1 > as required. > As the scores returned by the geonased web service are in the range [0..inv] the sigmoid function would provide scores in the range of [0.5..1]. In addition most of the returned scores are big so a lot of results for the sigmoid function would be rounded to 1.0. The score of EntityLinking engines is expected to represent how well the mention in the text does match a label of the suggested Entity. The relevance (e.g. the popularity, page rank ...) of an Entity can be used to adapt this score to provide more initiative sorting for entities. E.g. both the Entityhub Linking as well as the FST linking engine do both calculate the confidence based on the similarity of the mention with the best matching label of an Entity. In addition they do have an option that allows to modify the score by max 0.1 based on the relevance of the Entity. In cases where users want to preserve the score as returned by the Geonames WebService we could add those to fise:EntityAnnotations (by using an engine specific property). best Rupert -- | Rupert Westenthaler rupert.westentha...@gmail.com | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen | REDLINK.CO .......................................................................... | http://redlink.co/