Rupert Westenthaler created STANBOL-1230:
--------------------------------------------

             Summary: Add Lookup Cache to EntityLinking Engine
                 Key: STANBOL-1230
                 URL: https://issues.apache.org/jira/browse/STANBOL-1230
             Project: Stanbol
          Issue Type: Improvement
          Components: Enhancement Engines
    Affects Versions: 0.12.0
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler
             Fix For: 0.12.0


The EntityLinkingEngine should cache results of lookups on the EntitySearchers.

Entities are often reoccurring in analyzed Documents. Because of that caching 
results for look upped  tokens should provide considerable performance 
improvements as tatistics shows that ~90% of the processing time for the 
EntityLinking engine is contributed by the entity look-up. 

So if 20% of all Entity mentions are about reoccurring Entities the processing 
time should be reduced by about 18%.

The cache will use the list of search string as key and a list of returned 
Entities as value. The cache will only collect look-up results for the 
currently analyzed document. 

EntityLinking statistics will be updated to include the cache hit percentage.

This issue affects both the trunk (1.0.0-SNAPSHOT) as well as the stable 0.12 
releasing branch. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to