Hi Hieu, Should I make tokenization and truecasing for both corpus file and parallel files or just for parallel files only?
Thanks From: [email protected] [mailto:[email protected]] On Behalf Of Hieu Hoang Sent: Monday, November 3, 2014 8:18 PM To: [email protected] Cc: moses-support Subject: Re: [Moses-support] Tokenization issue hi ihab at it's most basic, tokenization separates punctuations from words. However, it can also be used to separate a word into it's morphemes to make it easier to process. Moses doesn't include a very good Arabic tokeniser. Each language needs a nonbreaking_prefix file, located in scripts/share/nonbreaking_prefixes This doesn't exist for arabic, so the tokenizer uses the English file instead. If you create a nonbreaking_prefixes for arabic, please share it with us. Or use a tool like MADA to tokenizer your arabic data On 28 October 2014 14:40, Ihab Ramadan <[email protected]> wrote: Dears, I have misunderstanding on what tokenization really do What I think that It makes the translation of text like translated text gives the same output as “translated” text or translated.text or translated text . which ignores any punctuations in the translated text Am I right ? I did the tokenization on my data but this is not happening Note : in the tokenizer script I should feed it with the language and it could not recognize the arabic language (ar) which is my target language Best Regards Ihab Ramadan| Senior Developer| <http://www.saudisoft.com/> Saudisoft - Egypt | Tel +2 02 330 320 37 Ext- 0 | Mob+201007570826 <tel:%2B201007570826> | Fax+20233032036 <tel:%2B20233032036> | Follow us on <http://www.linkedin.com/company/77017?trk=vsrp_companies_res_name&trkInfo=VSRPsearchId%3A1489659901402995947155%2CVSRPtargetId%3A77017%2CVSRPcmpt%3Aprimary> linked | <https://www.facebook.com/pages/Saudisoft-Co-Ltd/289968997768973?ref_type=bookmark> ZA102637861 | <https://twitter.com/Saudisoft> ZA102637858 _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
