Good catch, Ken. I see your point, For example, considering the likely 
language pair (EN-AR), there could be some non-printing characters in 
the text file that the copy/paste clipboard drops.


On 01/15/2015 08:39 AM, Kenneth Heafield wrote:
> I'll inject that it is plausible there is some weird Unicode going on
> there and copy-paste on Linux sometimes canonicalized graphemes.  Whilst
> I'm inclined to side with Tom, the only way to sort this out is with the
> raw file from Ihab as e.g. a gzipped attachment.
>
> Kenneth
>
> On 01/14/2015 08:33 PM, Tom Hoar wrote:
>> I just ran the same sentence through the newest github clone (today).
>>
>> corporamgr@domt-v2:~/Public/src/mosesdecoder/scripts/tokenizer$
>> ./tokenizer.perl -no-escape -q -l en < test.txt
>> which will guide you through connecting and configuring your printer 's
>> wireless connection .
>> which will guide you through connecting and configuring your printer 's
>> wireless connection .
>> which will guide you through connecting and configuring your printer 's
>> wireless connection .
>> which will guide you through connecting and configuring your printer 's
>> wireless connection .
>> which will guide you through connecting and configuring your printer 's
>> wireless connection .
>>
>> This is not a Perl script problem. What shell and command line are you
>> using for your "in the file" results? You'll find the problem in either
>> your shell or your custom tool chain(s) before you run tokenizer.perl.
>>
>>
>>
>> On 01/14/2015 04:13 PM, Ihab Ramadan wrote:
>>> Dears,
>>>
>>> I still have this problem, for not confusing the decoder I used the
>>> “–no-escape” parameter in the tokenizer.perl script but still have the
>>> problem of adding extra space after quotations for tokenizing files
>>> however in tokenizing a segment it comes without the extra space
>>>
>>> For example
>>>
>>> In the file
>>>
>>> “which will guide you through connecting and configuring your
>>> printer's wireless connection. “ à“which will guide you through
>>> connecting and configuring your printer ' s wireless connection .”
>>>
>>> As a segment
>>>
>>> “which will guide you through connecting and configuring your
>>> printer's wireless connection. “ à“which will guide you through
>>> connecting and configuring your printer 's wireless connection .”
>>>
>>> I wonder if it is the same script why it generated two different outputs
>>>
>>> I have no experience in perl so I could not get the line of code which
>>> differ between if the segment in a file or just one segment passed as
>>> a parameter to the script
>>>
>>> Please help
>>>
>>>   
>>>
>>>   
>>>
>>>   
>>>
>>> *From:*Ihab Ramadan [mailto:[email protected]]
>>> *Sent:* Monday, January 5, 2015 10:09 AM
>>> *To:* [email protected]
>>> *Subject:* Tokenization problem
>>>
>>>   
>>>
>>> Dears,
>>>
>>> Using the tokenizer on the training files replaces the apostrophes
>>> with “&apos; s” (with space) but if I use the same script to tokenize
>>> a sentence it makes the apostrophes to be “&apos;s” (without a space)
>>>
>>> This problem confuse the decoder while translation
>>>
>>> How to solve this peoblem
>>>
>>> Thanks
>>>
>>>   
>>>
>>> Best Regards
>>>
>>> /Ihab Ramadan/| Senior Developer|Saudisoft <http://www.saudisoft.com/>
>>> - Egypt| *Tel * +2 02 330 320 37  Ext- 0| Mob+201007570826 |
>>> Fax+20233032036 | *Follow us on *linked
>>> <http://www.linkedin.com/company/77017?trk=vsrp_companies_res_name&trkInfo=VSRPsearchId%3A1489659901402995947155%2CVSRPtargetId%3A77017%2CVSRPcmpt%3Aprimary>*
>>>  |
>>> **ZA102637861*
>>> <https://www.facebook.com/pages/Saudisoft-Co-Ltd/289968997768973?ref_type=bookmark>*
>>>  |
>>> **ZA102637858* <https://twitter.com/Saudisoft>
>>>
>>>   
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to