Re: [Moses-support] Tokenization problem

Barry Haddow Mon, 05 Jan 2015 08:59:01 -0800

Hi Ihab

If you run the tokeniser with the same arguments then it should give the 
same results in test as in training. The spaces around the apostrophe 
depend on the context - maybe if you post the full sentences someone can 
explain why they are handled differently,


cheers - Barry

On 05/01/15 08:09, Ihab Ramadan wrote:
>
> Dears,
>
> Using the tokenizer on the training files replaces the apostrophes 
> with “&apos; s” (with space) but if I use the same script to tokenize 
> a sentence it makes the apostrophes to be “&apos;s” (without a space)
>
> This problem confuse the decoder while translation
>
> How to solve this peoblem
>
> Thanks
>
> Best Regards
>
> /Ihab Ramadan/| Senior Developer|Saudisoft <http://www.saudisoft.com/> 
> - Egypt| *Tel * +2 02 330 320 37 Ext- 0| Mob+201007570826 | 
> Fax+20233032036 | *Follow us on *linked 
> <http://www.linkedin.com/company/77017?trk=vsrp_companies_res_name&trkInfo=VSRPsearchId%3A1489659901402995947155%2CVSRPtargetId%3A77017%2CVSRPcmpt%3Aprimary>*
>  | 
> **ZA102637861* 
> <https://www.facebook.com/pages/Saudisoft-Co-Ltd/289968997768973?ref_type=bookmark>*
>  | 
> **ZA102637858* <https://twitter.com/Saudisoft>
>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Tokenization problem

Reply via email to