[jira] [Commented] (JOSHUA-341) Integrated Transliteration

Thamme Gowda (JIRA) Sat, 16 Mar 2019 17:04:09 -0700


    [ 
https://issues.apache.org/jira/browse/JOSHUA-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16794381#comment-16794381
 ]


Thamme Gowda commented on JOSHUA-341:
-------------------------------------

Here is another handy tool to consider. 

[https://github.com/isi-nlp/uroman] 

It uses Unicode tables and rules to transliterate non-roman script words to 
Roman script (No training needed)

(Sorry, yet another Perl script, but *sometimes/most-times* this is all we 
need) 

 

> Integrated Transliteration
> --------------------------
>
>                 Key: JOSHUA-341
>                 URL: https://issues.apache.org/jira/browse/JOSHUA-341
>             Project: Joshua
>          Issue Type: Task
>          Components: core, language packs
>            Reporter: Tommaso Teofili
>            Priority: Major
>              Labels: gsoc2019
>
> Many of the language packs released translated from languages with non-Latin 
> scripts. Words that cannot be translated are therefore pushed through to the 
> translation and cannot even be read by someone who doesn't know that script. 
> At the same time, many untranslatable words are simply transliterated words. 
> For example, an Arabic word might be an English word (like a name or 
> technical term) that has simply been written in Arabic. These words can be 
> transliterated. It would be good to add built-in transliteration models that 
> can be applied to all out-of-vocabulary words and enabled for certain 
> languages. Transliteration models can be built over the same bitext using 
> techniques like Sajjad, Fraser, and Schmid (2012) [1].
> [1] : http://www.anthology.aclweb.org/P/P12/P12-1049.pdf



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (JOSHUA-341) Integrated Transliteration

Reply via email to