Re: Workaround to a Unicode bug needed

Pierre Nugues Mon, 06 Sep 2010 06:50:34 -0700

Dear Shawn,

Thank you for you answer. However, this does not seem to work.
I used two versions of Perl, the standard Mac installation 5.8.8 and the Active 
Perl 5.12.1 and neither produces the correct output.


Here is what the output should be, one word per line. I only show the first 
words. Some words contain accented characters and the quote has been removed 
(»):
--
Tjuvgömmare
!
säga
skatorna
och
se
ut
som
samvetet
självt
.
--

Here is what the two Perl versions produce. Version 5.12. The output mixes 
UTF-8 and Latin 1 and the quote is not removed:
----
»TjuvgÃ¶mmare
!
»
sÃ¤ga
skatorna
och
se
ut
som
samvetet
sjÃ¤lvt
.
---

Version 5.8.8. The quote is not removed, but the accented characters are.

---
»Tjuvg
mmare
!
»
s
ga
skatorna
och
se
ut
som
samvetet
sj
lvt
.
---
Pierre
Le 6 sept. 2010 à 15:25, Shawn H Corey a écrit :

> On Mon, 2010-09-06 at 15:10 +0200, Pierre Nugues wrote:
>> 
>> I wrote a simple tokenizer for texts containing Latin9 characters. It
>> does not behave as expected with the Swedish text below and I would
>> like to find a workaround. 
> 
> Add these lines to top of your program:
> 
> use strict;
> use warnings;
> 
> binmode STDIN, 'encoding(utf8)';
> binmode STDOUT, 'encoding(utf8)';
> 
> 
> -- 
> Just my 0.00000002 million dollars worth,
>  Shawn
> 
> Programming is as much about organization and communication
> as it is about coding.
> 
> The secret to great software:  Fail early & often.
> 
> Eliminate software piracy:  use only FLOSS.
> 


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Workaround to a Unicode bug needed

Reply via email to