Hi list.
I wonder in anyone would mind confirming this for me:
I've just spotted a strange behaviour with unicode and regex in perl
5.8.1 as documented in the following script.
$junktext is a string of unicode characters containing 3 smilys. 1
smily is at the end of the string.
when doing a regex replace (s///) with the case insensitive switch OFF
all 3 smilys are replaced.
when doing the replace with the case insensitive switch ON only the
first 2 smilys are replaced.
Can anyone confirm this on their setup. Does this still occur in the
latest perl ie 5.8.5 or even 5.8.6
#!/usr/bin/perl
#
# This is a test script to demonstrate a problem found with unicode in
perls regex.
# The version of perl tested in 5.8.1 NOT the latest version
#
# $junktext is a string of unicode characters containing 3 smilys. 1
smily is at the end of the string.
#
# when doing a regex replace (s///) with the case insensitive switch
OFF all 3 smilys are replaced
# when doing the replace with the case insensitive switch ON only the
first 2 smilys are replaced
#
# author: [EMAIL PROTECTED] 2004/12/01 15:30:00
use strict;
use warnings;
use utf8;
use CGI (':standard');
use Encode qw/is_utf8 decode/;
binmode(STDOUT, ":utf8");
BEGIN {
print header(-type => "text/html", -charset => "utf-8");
print start_html(-encoding => 'utf-8',-title => "Some sample
characters");
print "\n\n";
}
my $junktext =
"\x{0142}\x{e7}\x{263a}\x{0104}\x{263a}\x{0104}re\x{e7}enu\x{263a}";
# comment the first and uncomment the second to see it suddenly break... why?
# my $matches = ($junktext =~ s/(\x{263a})/* was smily */g);
my $matches = ($junktext =~ s/(\x{263a})/* was smily */gi);
print $matches . " = " . $junktext;
END {
print "\n\n", end_html;
}
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>