[ 
https://issues.apache.org/jira/browse/JOSHUA-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16802603#comment-16802603
 ] 

Tommaso Teofili commented on JOSHUA-341:
----------------------------------------

thanks [~thammegowda], maybe we can use it or port it to a Java version so that 
we can more easily integrate that.

> Integrated Transliteration
> --------------------------
>
>                 Key: JOSHUA-341
>                 URL: https://issues.apache.org/jira/browse/JOSHUA-341
>             Project: Joshua
>          Issue Type: Task
>          Components: core, language packs
>            Reporter: Tommaso Teofili
>            Priority: Major
>              Labels: gsoc2019
>
> Many of the language packs released translated from languages with non-Latin 
> scripts. Words that cannot be translated are therefore pushed through to the 
> translation and cannot even be read by someone who doesn't know that script. 
> At the same time, many untranslatable words are simply transliterated words. 
> For example, an Arabic word might be an English word (like a name or 
> technical term) that has simply been written in Arabic. These words can be 
> transliterated. It would be good to add built-in transliteration models that 
> can be applied to all out-of-vocabulary words and enabled for certain 
> languages. Transliteration models can be built over the same bitext using 
> techniques like Sajjad, Fraser, and Schmid (2012) [1].
> [1] : http://www.anthology.aclweb.org/P/P12/P12-1049.pdf



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to