[ https://issues.apache.org/jira/browse/JOSHUA-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16802603#comment-16802603 ]
Tommaso Teofili commented on JOSHUA-341: ---------------------------------------- thanks [~thammegowda], maybe we can use it or port it to a Java version so that we can more easily integrate that. > Integrated Transliteration > -------------------------- > > Key: JOSHUA-341 > URL: https://issues.apache.org/jira/browse/JOSHUA-341 > Project: Joshua > Issue Type: Task > Components: core, language packs > Reporter: Tommaso Teofili > Priority: Major > Labels: gsoc2019 > > Many of the language packs released translated from languages with non-Latin > scripts. Words that cannot be translated are therefore pushed through to the > translation and cannot even be read by someone who doesn't know that script. > At the same time, many untranslatable words are simply transliterated words. > For example, an Arabic word might be an English word (like a name or > technical term) that has simply been written in Arabic. These words can be > transliterated. It would be good to add built-in transliteration models that > can be applied to all out-of-vocabulary words and enabled for certain > languages. Transliteration models can be built over the same bitext using > techniques like Sajjad, Fraser, and Schmid (2012) [1]. > [1] : http://www.anthology.aclweb.org/P/P12/P12-1049.pdf -- This message was sent by Atlassian JIRA (v7.6.3#76005)