Hi.
Perl's handling of Unicode (and of character sets in general) is
extremely clever and powerful.
But it can sometimes be a bit counter-intuitive.
In any case, it seems to me that the evaluation of the PERL_UNICODE
environment variable is a "Perl thing" rather than a "mod_perl thing",
and that mod_perl per se should not interfere with it. But maybe
mod_perl does some magic on filehandles in general which interferes, who
knows ?
Maybe the first thing to do is to ascertain that the problem is really
due to a mishandling of the PERL_UNICODE environment variable, or
something else. I propose a simple test :
Instead of relying on the PERL_UNICODE variable, what happens when you
change the open() statement as follows :
> open(FH, '<:utf8',"/tmp/utf8.txt");
thus explicitly setting a UTF-8 decoding layer for the stream FH,
instead of relying on PERL_UNICODE.
Does your follow-up test then indicate that the utf8 flag for $var is set ?
Note : even with the decoding layer set, that does not necessarily mean
that all data you read will end up with the utf8 flag set. It depends
on the data. But in your case, if you are really using the same file
data in both tests you show below, then it seems a valid test.
André
Rob French wrote:
I have recently started converting one of our webapps to make it fully
UTF-8 compliant. All input/output from the webapp will be encoded as
UTF-8. As such, I am trying to use the PERL_UNICODE env variable to
enable UTF-8 flagging on all input/output streams. This works with
standalone Perl scripts like the one below (the /tmp/utf8.txt file
contains a single character (U+00E6 - LATIN SMALL LETTER Ae) :
#!/usr/bin/perl -w
use strict;
use Encode;
print "PERL_UNICODE Value: ${^UNICODE}\n";
open(FH, "</tmp/utf8.txt");
undef $/;
my $var = <FH>;
close(FH);
print "Flagged as UTF8? " . Encode::is_utf8($var) . "\n";
exit;
The resulting output after setting my PERL_UNICODE env var to SDA is:
PERL_UNICODE Value: 63
Flagged as UTF8? 1
Which is correct. Perl processed the input stream (open) as UTF-8 and
flagged it accordingly.
Unfortunately if I put the exact same open call in my mod_perl
TransHandler $var is not flagged as UTF-8. The resulting output when
run in the TransHandler is:
PERL_UNICODE Value: 63
Flagged as UTF8?
The input stream is not processed as UTF-8 and not flagged internally
as UTF-8. If I explicitly add an Encode::decode_utf8($var) in mod_perl
then everything works as expected. It appears as if mod_perl is
ignoring the PERL_UNICODE env variable and not processing my input
streams as UTF-8.
Thanks in advance.
Cheers
Environment details below:
Summary of my perl5 (revision 5 version 8 subversion 5) configuration:
Platform:
osname=linux, osvers=2.6.9-22.18.bz155725.elsmp,
archname=i386-linux-thread-multi
uname='linux hs20-bc1-4.build.redhat.com
2.6.9-22.18.bz155725.elsmp #1 smp thu nov 17 15:34:08 est 2005 i686
i686 i386 gnulinux '
config_args='-des -Doptimize=-O2 -g -pipe -m32 -march=i386
-mtune=pentium4 -Dversion=5.8.5 -Dmyhostname=localhost
[EMAIL PROTECTED] -Dcc=gcc -Dcf_by=Red Hat, Inc.
-Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux
-Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads
-Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db
-Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio
-Dinstallusrbinperl -Ubincompat5005 -Uversiononly
-Dpager=/usr/bin/less -isr -Dinc_version_list=5.8.4 5.8.3 5.8.2 5.8.1
5.8.0'
hint=recommended, useposix=true, d_sigaction=define
usethreads=define use5005threads=undef useithreads=define
usemultiplicity=define
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
optimize='-O2 -g -pipe -m32 -march=i386 -mtune=pentium4',
cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
-fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'
ccversion='', gccversion='3.4.6 20060404 (Red Hat 3.4.6-2)', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='gcc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib
libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
libc=/lib/libc-2.3.4.so, so=so, useshrplib=true, libperl=libperl.so
gnulibc_version='2.3.4'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E
-Wl,-rpath,/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE'
cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'
Characteristics of this binary (from libperl):
Compile-time options: DEBUGGING MULTIPLICITY USE_ITHREADS
USE_LARGE_FILES PERL_IMPLICIT_CONTEXT
Built under linux
Compiled at Jul 24 2006 18:28:10
@INC:
/usr/lib/perl5/5.8.5/i386-linux-thread-multi
/usr/lib/perl5/5.8.5
/usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.5
/usr/lib/perl5/site_perl/5.8.4
/usr/lib/perl5/site_perl/5.8.3
/usr/lib/perl5/site_perl/5.8.2
/usr/lib/perl5/site_perl/5.8.1
/usr/lib/perl5/site_perl/5.8.0
/usr/lib/perl5/site_perl
/usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.8.5
/usr/lib/perl5/vendor_perl/5.8.4
/usr/lib/perl5/vendor_perl/5.8.3
/usr/lib/perl5/vendor_perl/5.8.2
/usr/lib/perl5/vendor_perl/5.8.1
/usr/lib/perl5/vendor_perl/5.8.0
/usr/lib/perl5/vendor_perl
.
mod_perl version: 1.30