silent upgrading situations

2007-11-12 Thread E R
I was wondering if there any other good examples of when Perl will silently upgrade (as in utf8::upgrade) a string. For instance, perl will do this when you concatenate a non-utf8 string with a utf8 string: $a = Hello; # utf8 flag not set $b = chr(1024); $a .= $b; # $a now has its

Re: Explaining this behavior (was Re: good name for characters matching [^\0-\377]?)

2007-10-22 Thread E R
On 10/19/07, Juerd Waalboer [EMAIL PROTECTED] wrote: E R skribis 2007-10-19 17:14 (-0500): So it seems that in light of this one should always use Encode::encode with these modules to ensure the data is represented the way you want it. Encode::encode, Encode::encode_utf8, or utf8::encode

Re: Explaining this behavior (was Re: good name for characters matching [^\0-\377]?)

2007-10-22 Thread E R
On 10/22/07, Juerd Waalboer [EMAIL PROTECTED] wrote: There's an alternative way of viewing this: there are two types of strings: binary and text. If you encode text, you get binary. I think I'm trying to make a slightly different point: part of what Encode::encode MUST do is to create a Perl

Re: de-utf8-ing a string

2007-10-18 Thread E R
On 10/17/07, Juerd Waalboer [EMAIL PROTECTED] wrote: utf8::downgrade(); Thanks!

Re: good name for characters matching [^\0-\377]?

2007-10-18 Thread E R
I should have added that in my presentation I am attempting to present Perl strings from a character set agnostic perspective. So, even though there is a strong bias for Perl to treat character ordinals 255 as Unicode code-points, I don't want people to automatically think Unicode when

just to test my understanding...

2007-10-17 Thread E R
is this regex: $has_wide = ($str =~ m/[^\0-\377]/); the same as this function? sub has_wide { my $str = shift; for my $i (0..length($str)-1) { return 1 if (ord(substr($str, $i, 1)) = 256); } 0; } but this doesn't seem to work: $has_wide = ($str =~ m/[\x{100}-]/);

de-utf8-ing a string

2007-10-17 Thread E R
Hello, I need an efficient way to do this: my $buf; sub append { my $x = shift; my $new; for (my $i = 0; $i length($x); $i++) { $new .= chr(ord(substr($x, $i, 1))); } $buf .= $new; } In practice, $buf will not have its utf8 flag set, and $x may have it set, but will not contain

questions about encode/decode

2007-10-15 Thread E R
Just a couple of questions: 1. What is the result of Encode::encode(iso-8559-1, $x) if $x is not a utf8 string (i.e. Encode::is_utf8($x) returns false.) 2. What is the result of $string = decode(iso-8859-1, $octets) if $octets is a utf8 string? Thanks!

Re: questions about encode/decode

2007-10-15 Thread E R
the overhead to constantly look up the encoder sub for every fragment of HTML I need to escape. Thanks... On 10/15/07, Juerd Waalboer [EMAIL PROTECTED] wrote: E R skribis 2007-10-15 16:25 (-0500): 1. What is the result of Encode::encode(iso-8559-1, $x) if $x is not a utf8 string (i.e. Encode