I'm not sure if this is a mod_perl problem or not, but I can't reproduce it under regular perl, so I thought I'd post here. Anyway it's apache 1.3.29, mod_perl 1.29 and perl 5.8.4.
The problem is occuring in the following piece of code. I've tried creating a test case, but I can't seem to narrow it down. Just creating a basic handler to test this seems to work, but when it's used like this buried deep in some code, it fails. Always a bugger of a problem to track down. Anyway, the problem seems to be with using "join" where the array has utf-8 strings in it. The resultant string does NOT have the utf-8 flag set. The basic problem code is this: $BodyText = join("\n", @Lines[0 .. (@Lines < 3 ? @Lines-1 : 2)]) . "\n"; Narrowing it down a bit, and dumping the internal structures as so: warn '$Lines[0]: ' . $Lines[0]; warn 'utf-8 $Lines[0]: ' . is_utf8($Lines[0]); Dump($Lines[0]); $BodyText = join("\n", $Lines[0]); warn '$BodyText: ' . $BodyText; warn 'utf-8 $BodyText: ' . is_utf8($BodyText); Dump($BodyText); I get: $Lines[0]: Hej mor, utf-8 $Lines[0]: 1 SV = PV(0x9a051a4) at 0xa27f828 REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0xa2f0008 "Hej mor,"\0 [UTF8 "Hej mor,"] CUR = 8 LEN = 9 Which looks fine, but then the joined result: $BodyText: Hej mor, utf-8 $BodyText: at /home/mod_perl/hm/Data/Store/Mailbox.pm line 400. SV = PVMG(0xa279140) at 0x8cb9228 REFCNT = 1 FLAGS = (PADBUSY,PADMY,GMG,SMG,pPOK) IV = 0 NV = 0 PV = 0xa2bbf50 "Hej mor,\n"\0 CUR = 9 LEN = 408 MAGIC = 0xa397cd8 MG_VIRTUAL = &PL_vtbl_taint MG_TYPE = PERL_MAGIC_taint(t) Ouch, that seems wrong. No utf-8 flag, and the string seems to be marked as tainted, even though the inputs aren't? I thought maybe it had something to do with that $BodyText had been assigned to earlier and obviously was tainged, and wasn't loosing it when the new value was being assigned to it. So I changed to: $#Lines = 0; warn '$Lines[0]: ' . $Lines[0]; warn 'utf-8 $Lines[0]: ' . is_utf8($Lines[0]); Dump($Lines[0]); my $NewBodyText = join("\n", $Lines[0]); warn '$NewBodyText: ' . $NewBodyText; warn 'utf-8 $NewBodyText: ' . is_utf8($NewBodyText); Dump($NewBodyText); Which gives: $Lines[0]: Hej mor, at /home/mod_perl/hm/Data/Store/Mailbox.pm line 393. utf-8 $Lines[0]: 1 at /home/mod_perl/hm/Data/Store/Mailbox.pm line 394. SV = PV(0x99f7a94) at 0xa386e68 REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0xa2bc188 "Hej mor,"\0 [UTF8 "Hej mor,"] CUR = 8 LEN = 9 $BodyText: Hej mor, utf-8 $BodyText: at /home/mod_perl/hm/Data/Store/Mailbox.pm line 400. SV = PVMG(0xa3b61a8) at 0xa346cc0 REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK) IV = 0 NV = 0 PV = 0xa3dde10 "Hej mor,\n"\0 CUR = 9 LEN = 162 Ah, so the magic taint stuff is now gone (though it is still a PVMG rather than a PV?), but it still doesn't have the UTF-8 flag set (and the fact this string doesn't have any utf-8 chars isn't the problem, it happens on all of them, even those that do have utf-8 chars). There is no 'use bytes' or anything at the top of the module, so I don't think that's the problem, though I don't think that should actuall affect things should it since it only controls how the actual source code is interpreted? I tried explicitly doing 'use utf8' to check, but no difference. Testing on a small standalong program from the command line, it does seem to work as expected: [EMAIL PROTECTED] root]# perl -e 'use Devel::Peek; $a="\x{1234}"; @a = ("a", $a, "b"); $c = join "d", @a; Dump($c);' SV = PV(0x811ee40) at 0x81318d0 REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0x812e0e8 "ad\341\210\264db"\0 [UTF8 "ad\x{1234}db"] CUR = 7 LEN = 8 Which actually raises a general perl question I just wanted to check. If you have two strings and concat them, and one has the utf-8 flag and the other doesn't, the resultant string does have the utf-8 flag set? Assuming that th e non-utf8 flagged string is ASCII, this will work fine. If it has chars > 127 in it though, it'll create a rubbish string... Ok, so to summarise, I think I see two problems here: 1. Assigning an untainted value to a value that was previously tainted leaves the new value tainted 2. join with utf-8 strings doesn't seem to leave the joined string with the utf-8 flag on Seems all a bit weird to me... Rob -- Report problems: http://perl.apache.org/bugs/ Mail list info: http://perl.apache.org/maillist/modperl.html List etiquette: http://perl.apache.org/maillist/email-etiquette.html