Rupert Westenthaler created STANBOL-1291:
--------------------------------------------

             Summary: Phonetic Linking
                 Key: STANBOL-1291
                 URL: https://issues.apache.org/jira/browse/STANBOL-1291
             Project: Stanbol
          Issue Type: Sub-task
          Components: Enhancement Engines
            Reporter: Rupert Westenthaler


Add Phonetic based EntityLinking support to Apache Stanbol

The Idea is to

1. start of with a sound file
2. use a speech to text engine like STANBOL-1007 to get the transcript
3. use NLP processing
4. use the FST Linking Enigne (STANBOL-1128) to link a SolrIndex configured for 
Phonetic linking [1].
5. correct the text transcript based on labels of linked entities.

The main question to be answers is if the phonetic matching (step 4) can 
correctly link Entities even if the writings in the text transcript are 
incorrect.

Additional things to validate are

* the quality of the text transcript good enough
* does NLP processing still sufficiently well work on text transcripts

This will definitely also require adaptations to the FST Linking Engine as the 
score is currently calculated base on the levenshtein distance of the mention 
with the best matching label of an entity - what does not make sense for this 
specific use case. 

[1] 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PhoneticFilterFactory



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to