hi ihab at it's most basic, tokenization separates punctuations from words. However, it can also be used to separate a word into it's morphemes to make it easier to process.
Moses doesn't include a very good Arabic tokeniser. Each language needs a nonbreaking_prefix file, located in scripts/share/nonbreaking_prefixes This doesn't exist for arabic, so the tokenizer uses the English file instead. If you create a nonbreaking_prefixes for arabic, please share it with us. Or use a tool like MADA to tokenizer your arabic data On 28 October 2014 14:40, Ihab Ramadan <[email protected]> wrote: > Dears, > > I have misunderstanding on what tokenization really do > > What I think that It makes the translation of text like translated text gives > the same output as “translated” text or translated.text or translated > text . which ignores any punctuations in the translated text > > Am I right ? > > I did the tokenization on my data but this is not happening > > Note : in the tokenizer script I should feed it with the language and it > could not recognize the arabic language (ar) which is my target language > > > > Best Regards > > *Ihab Ramadan*| Senior Developer| Saudisoft <http://www.saudisoft.com/> - > Egypt | *Tel * +2 02 330 320 37 Ext- 0 | Mob+201007570826 | Fax > +20233032036 | *Follow us on *[image: linked] > <http://www.linkedin.com/company/77017?trk=vsrp_companies_res_name&trkInfo=VSRPsearchId%3A1489659901402995947155%2CVSRPtargetId%3A77017%2CVSRPcmpt%3Aprimary>* > | > **[image: ZA102637861]* > <https://www.facebook.com/pages/Saudisoft-Co-Ltd/289968997768973?ref_type=bookmark>* > | > **[image: ZA102637858]* <https://twitter.com/Saudisoft> > > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
