Thank you Thomas,

So, i keep the text with these Special characters, it will not cause
problems? beacuse the training corpus is without these characters but only
the development and test corpus are like this.

Thank you :)

Bets


2014-02-21 14:40 GMT+01:00 Thomas Meyer <[email protected]>:

>
>
> Hi,
>
> That is not a 'problem' but XML 
> entities<http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references>
>  mark-up
> for special characters. You don't have to worry about this, as the
> tokenizer script does it for all characters in a consistent way.
>
> Best,
> Thomas
>
>
> On 21 February 2014 14:20, [email protected] <
> [email protected]> wrote:
>
>>
>> Hello all,
>>
>> I have a problem with the tokenizer.pl script. i get as a result a text
>> ith some special punctuation , like this for example :
>>
>> EU &apos;s Luxembourg-based statistical office reported
>>
>> The input file is a .txt file
>>
>> Is there any solution for this problem
>>
>> Thank you in advance
>>
>>
>> Bests
>> --
>> *Cyrine*
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>


-- 

*Cyrine NASRIPh.D. Student in Computer Science*
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to