I'm not sure if this is a mod_perl problem or not, but I can't reproduce it
under regular perl, so I thought I'd post here. Anyway it's apache 1.3.29,
mod_perl 1.29 and perl 5.8.4.
The problem is occuring in the following piece of code. I've tried creating
a test case, but I can't seem to narrow it down. Just creating a basic
handler to test this seems to work, but when it's used like this buried deep
in some code, it fails. Always a bugger of a problem to track down.
Anyway, the problem seems to be with using "join" where the array has utf-8
strings in it. The resultant string does NOT have the utf-8 flag set. The
basic problem code is this:
$BodyText = join("\n", @Lines[0 .. (@Lines < 3 ? @Lines-1 : 2)]) .
"\n";
Narrowing it down a bit, and dumping the internal structures as so:
warn '$Lines[0]: ' . $Lines[0];
warn 'utf-8 $Lines[0]: ' . is_utf8($Lines[0]);
Dump($Lines[0]);
$BodyText = join("\n", $Lines[0]);
warn '$BodyText: ' . $BodyText;
warn 'utf-8 $BodyText: ' . is_utf8($BodyText);
Dump($BodyText);
I get:
$Lines[0]: Hej mor,
utf-8 $Lines[0]: 1
SV = PV(0x9a051a4) at 0xa27f828
REFCNT = 1
FLAGS = (POK,pPOK,UTF8)
PV = 0xa2f0008 "Hej mor,"\0 [UTF8 "Hej mor,"]
CUR = 8
LEN = 9
Which looks fine, but then the joined result:
$BodyText: Hej mor,
utf-8 $BodyText: at /home/mod_perl/hm/Data/Store/Mailbox.pm line 400.
SV = PVMG(0xa279140) at 0x8cb9228
REFCNT = 1
FLAGS = (PADBUSY,PADMY,GMG,SMG,pPOK)
IV = 0
NV = 0
PV = 0xa2bbf50 "Hej mor,\n"\0
CUR = 9
LEN = 408
MAGIC = 0xa397cd8
MG_VIRTUAL = &PL_vtbl_taint
MG_TYPE = PERL_MAGIC_taint(t)
Ouch, that seems wrong. No utf-8 flag, and the string seems to be marked as
tainted, even though the inputs aren't? I thought maybe it had something to
do with that $BodyText had been assigned to earlier and obviously was
tainged, and wasn't loosing it when the new value was being assigned to it.
So I changed to:
$#Lines = 0;
warn '$Lines[0]: ' . $Lines[0];
warn 'utf-8 $Lines[0]: ' . is_utf8($Lines[0]);
Dump($Lines[0]);
my $NewBodyText = join("\n", $Lines[0]);
warn '$NewBodyText: ' . $NewBodyText;
warn 'utf-8 $NewBodyText: ' . is_utf8($NewBodyText);
Dump($NewBodyText);
Which gives:
$Lines[0]: Hej mor, at /home/mod_perl/hm/Data/Store/Mailbox.pm line 393.
utf-8 $Lines[0]: 1 at /home/mod_perl/hm/Data/Store/Mailbox.pm line 394.
SV = PV(0x99f7a94) at 0xa386e68
REFCNT = 1
FLAGS = (POK,pPOK,UTF8)
PV = 0xa2bc188 "Hej mor,"\0 [UTF8 "Hej mor,"]
CUR = 8
LEN = 9
$BodyText: Hej mor,
utf-8 $BodyText: at /home/mod_perl/hm/Data/Store/Mailbox.pm line 400.
SV = PVMG(0xa3b61a8) at 0xa346cc0
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
IV = 0
NV = 0
PV = 0xa3dde10 "Hej mor,\n"\0
CUR = 9
LEN = 162
Ah, so the magic taint stuff is now gone (though it is still a PVMG rather
than a PV?), but it still doesn't have the UTF-8 flag set (and the fact this
string doesn't have any utf-8 chars isn't the problem, it happens on all of
them, even those that do have utf-8 chars). There is no 'use bytes' or
anything at the top of the module, so I don't think that's the problem,
though I don't think that should actuall affect things should it since it
only controls how the actual source code is interpreted? I tried explicitly
doing 'use utf8' to check, but no difference.
Testing on a small standalong program from the command line, it does seem to
work as expected:
[EMAIL PROTECTED] root]# perl -e 'use Devel::Peek; $a="\x{1234}"; @a = ("a", $a,
"b"); $c = join "d", @a; Dump($c);'
SV = PV(0x811ee40) at 0x81318d0
REFCNT = 1
FLAGS = (POK,pPOK,UTF8)
PV = 0x812e0e8 "ad\341\210\264db"\0 [UTF8 "ad\x{1234}db"]
CUR = 7
LEN = 8
Which actually raises a general perl question I just wanted to check. If you
have two strings and concat them, and one has the utf-8 flag and the other
doesn't, the resultant string does have the utf-8 flag set? Assuming that th
e non-utf8 flagged string is ASCII, this will work fine. If it has chars >
127 in it though, it'll create a rubbish string...
Ok, so to summarise, I think I see two problems here:
1. Assigning an untainted value to a value that was previously tainted
leaves the new value tainted
2. join with utf-8 strings doesn't seem to leave the joined string with the
utf-8 flag on
Seems all a bit weird to me...
Rob
--
Report problems: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html
List etiquette: http://perl.apache.org/maillist/email-etiquette.html