I'm using Perl 5.8.1 on Panther (the stock version) and it seems as though not everything handles utf8 gracefully. (I haven't worked in Perl for a while; not since MacPerl 5.0.3, I think it was) Regular expressions seem to work, as does input and output, but some of the modules and built-in functions don't seem to work quite as well as would be hoped.

In particular I've noticed that "substr" doesn't seem to work correctly when dealing with wide characters. For example:

use utf8;
...
$blah =~ m/<wide_regex>/g;
$position = pos $blah;

seems to give the correct character position but,

$matched = substr($blah,
                  $position - length($blah),
                  length($blah));

doesn't put the matched text into $matched when there are wide characters in $blah -- i.e., it seems to work off bytes rather than characters.

Are these issues documented somewhere and are there standard techniques for dealing with them?


John Blumel



Reply via email to