Dears,
I found the problem
At the line number 289 in the tokenizer.perl script just add a space like
that
The original code
$text =~ s/([\p{IsAlpha}])[']([\p{IsAlpha}])/$1 ' $2/g;
The modified one
$text =~ s/([\p{IsAlpha}])[']([\p{IsAlpha}])/$1 ' $2/g;
By this modification tokenization of files will be the same as tokenizing
one segment
Thanks
From: Ihab Ramadan [mailto:[email protected]]
Sent: Wednesday, January 14, 2015 11:14 AM
To: [email protected]
Subject: RE: Tokenization problem
Dears,
I still have this problem, for not confusing the decoder I used the
no-escape parameter in the tokenizer.perl script but still have the
problem of adding extra space after quotations for tokenizing files however
in tokenizing a segment it comes without the extra space
For example
In the file
which will guide you through connecting and configuring your printer's
wireless connection. à which will guide you through connecting and
configuring your printer ' s wireless connection .
As a segment
which will guide you through connecting and configuring your printer's
wireless connection. à which will guide you through connecting and
configuring your printer 's wireless connection .
I wonder if it is the same script why it generated two different outputs
I have no experience in perl so I could not get the line of code which
differ between if the segment in a file or just one segment passed as a
parameter to the script
Please help
From: Ihab Ramadan [mailto:[email protected]]
Sent: Monday, January 5, 2015 10:09 AM
To: [email protected]
Subject: Tokenization problem
Dears,
Using the tokenizer on the training files replaces the apostrophes with
' s (with space) but if I use the same script to tokenize a sentence
it makes the apostrophes to be 's (without a space)
This problem confuse the decoder while translation
How to solve this peoblem
Thanks
Best Regards
Ihab Ramadan| Senior Developer| <http://www.saudisoft.com/> Saudisoft -
Egypt | Tel +2 02 330 320 37 Ext- 0 | Mob+201007570826 | Fax+20233032036 |
Follow us on
<http://www.linkedin.com/company/77017?trk=vsrp_companies_res_name&trkInfo=V
SRPsearchId%3A1489659901402995947155%2CVSRPtargetId%3A77017%2CVSRPcmpt%3Apri
mary> linked |
<https://www.facebook.com/pages/Saudisoft-Co-Ltd/289968997768973?ref_type=bo
okmark> ZA102637861 | <https://twitter.com/Saudisoft> ZA102637858
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support