Dears,

When I make tokenization on files it replaces the apostrophes with “'”
which make sense, but in the other side it crashes the meaning and the order
of the words at all, for example:

 

Sentence before tokenization :

Src : keep your notification's payload under 5 kb.

Trg: اجعل حمولة الإعلام أقل من 5 كيلوبايت.

Sentence after tokenization :

Src: keep your notification ' s payload under 5 kb .

Trg: اجعل حمولة الإعلام أقل من 5 كيلوبايت .

If I translate “keep” without using tokenization it will generates “اجعل”
which Is correct but after using tokenization moses generates “الإعلام”
which means that the alignment is crashed 

do I make something wrong?

do I miss something or just it is a natural behavior when I use tokenization

Thanks 

 

Best Regards

Ihab Ramadan| Senior Developer|  <http://www.saudisoft.com/> Saudisoft -
Egypt | Tel  +2 02 330 320 37  Ext- 0 | Mob+201007570826 | Fax+20233032036 |
Follow us on
<http://www.linkedin.com/company/77017?trk=vsrp_companies_res_name&trkInfo=V
SRPsearchId%3A1489659901402995947155%2CVSRPtargetId%3A77017%2CVSRPcmpt%3Apri
mary> linked |
<https://www.facebook.com/pages/Saudisoft-Co-Ltd/289968997768973?ref_type=bo
okmark> ZA102637861 |  <https://twitter.com/Saudisoft> ZA102637858

 

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to