Rupert Westenthaler created STANBOL-1013:
--------------------------------------------

             Summary: Seperate (Entity)Spotting and (Entity)Linking
                 Key: STANBOL-1013
                 URL: https://issues.apache.org/jira/browse/STANBOL-1013
             Project: Stanbol
          Issue Type: Bug
          Components: Engine - Entity Linking, Enhancer
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler


Currently the EntityLinking engine performs two major tasks

(1) Spotting: detect the words in the analyzed Text that should be linked to 
the controlled Vocabulary. For that words are categorized as "linkable", 
"matchable" and "others". Also Chunks are considered for this task.

(2) Linking: Creates searches for "linkable" words while considering 
"matchable" words. Labels of suggested Entities are tokenized and matched 
against "linkable" and "matchable" words in the text. The 
EntityLinkingConfiguration ise used to configure this task.


See the documentation of the EntityLinkingEngine [1] for details.


(1) is configured by using the TextProcessingConfiguration and implemented by 
the ProcessingState class. (2) is configured by the EntityLinkingConfiguration 
and implemented by the EntityLinker class.

Proposed Workplan:
=====

1. clean-up and improve the internal APIs used by the EntityLinking engine

2. define a public API for describing Entity Spotting results: Possibilities 
include
    * using the metadata of the ContentItems (e.g. fise:TextAnnotations)
    * annotations in the AnalyzedText contentpart
    * some additional ContentPart

3 Split-up (1) and (2) as two separate EnhancementEngines so that
   * (1) NlpSpottingEngine: Spots potential Entities by using NLP processing 
results
   * (2) EntityLinkingEngine: Links Entities of a Controlled Vocabulary based 
on Spotting results

4. Integrate Named Entity Linking into the new Spotting & Linking workflow
    * By allowing Spotters to also annotate spotted Entities to carry 
additional metadata (e.g. the type as suggested by NER)
    * Extending the EntityLinkingEngine to make use of those metadata when 
searching/matching Entities from linked Vocabularies. 

[1] 
http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entityhublinking

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to