Tommaso Teofili created JOSHUA-341:
--------------------------------------

             Summary: Integrated Transliteration
                 Key: JOSHUA-341
                 URL: https://issues.apache.org/jira/browse/JOSHUA-341
             Project: Joshua
          Issue Type: Task
          Components: core, language packs
            Reporter: Tommaso Teofili


Many of the language packs released translated from languages with non-Latin 
scripts. Words that cannot be translated are therefore pushed through to the 
translation and cannot even be read by someone who doesn't know that script. At 
the same time, many untranslatable words are simply transliterated words. For 
example, an Arabic word might be an English word (like a name or technical 
term) that has simply been written in Arabic. These words can be 
transliterated. It would be good to add built-in transliteration models that 
can be applied to all out-of-vocabulary words and enabled for certain 
languages. Transliteration models can be built over the same bitext using 
techniques like Sajjad, Fraser, and Schmid (2012) [1].

[1] : http://www.anthology.aclweb.org/P/P12/P12-1049.pdf



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to