Hi Ihab If you run the tokeniser with the same arguments then it should give the same results in test as in training. The spaces around the apostrophe depend on the context - maybe if you post the full sentences someone can explain why they are handled differently,
cheers - Barry On 05/01/15 08:09, Ihab Ramadan wrote: > > Dears, > > Using the tokenizer on the training files replaces the apostrophes > with ā' sā (with space) but if I use the same script to tokenize > a sentence it makes the apostrophes to be ā'sā (without a space) > > This problem confuse the decoder while translation > > How to solve this peoblem > > Thanks > > Best Regards > > /Ihab Ramadan/| Senior Developer|Saudisoft <http://www.saudisoft.com/> > - Egypt| *Tel * +2 02 330 320 37 Ext- 0| Mob+201007570826 | > Fax+20233032036 | *Follow us on *linked > <http://www.linkedin.com/company/77017?trk=vsrp_companies_res_name&trkInfo=VSRPsearchId%3A1489659901402995947155%2CVSRPtargetId%3A77017%2CVSRPcmpt%3Aprimary>* > | > **ZA102637861* > <https://www.facebook.com/pages/Saudisoft-Co-Ltd/289968997768973?ref_type=bookmark>* > | > **ZA102637858* <https://twitter.com/Saudisoft> > > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
