[
https://issues.apache.org/jira/browse/STANBOL-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922864#comment-13922864
]
ELOUMBAT ASSOUA ALBERT commented on STANBOL-1291:
-------------------------------------------------
hi,
can you kindly forward the changes to my email address.
> Phonetic Linking
> ----------------
>
> Key: STANBOL-1291
> URL: https://issues.apache.org/jira/browse/STANBOL-1291
> Project: Stanbol
> Issue Type: New Feature
> Components: Enhancement Engines
> Reporter: Rupert Westenthaler
> Labels: gsoc2014, mentoring
>
> Add Phonetic based EntityLinking support to Apache Stanbol
> The Idea is to
> 1. start of with a sound file
> 2. use a speech to text engine like STANBOL-1007 to get the transcript
> 3. use NLP processing
> 4. use the FST Linking Enigne (STANBOL-1128) to link a SolrIndex configured
> for Phonetic linking [1].
> 5. correct the text transcript based on labels of linked entities.
> The main question to be answers is if the phonetic matching (step 4) can
> correctly link Entities even if the writings in the text transcript are
> incorrect.
> Additional things to validate are
> * the quality of the text transcript good enough
> * does NLP processing still sufficiently well work on text transcripts
> This will definitely also require adaptations to the FST Linking Engine as
> the score is currently calculated base on the levenshtein distance of the
> mention with the best matching label of an entity - what does not make sense
> for this specific use case.
> [1]
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PhoneticFilterFactory
--
This message was sent by Atlassian JIRA
(v6.2#6252)