I have tried setting it via Apache SetEnv directive as well as in my environment as root when starting Apache. In both cases the variable is correctly set in mod_perl it is just ignored.
As another test I tried the same code as a plain ol' CGI script and it works in that case. So the issue is definitely with mod_perl and its interaction with the PERL_UNICODE env variable. Thanks for your help investigating. I was worried that it might be a mod_perl 1.x thing or a Perl version thing. Good to know it isn't just my setup :) Rgrds, Rob On Wed, Mar 19, 2008 at 11:35 AM, André Warnier <[EMAIL PROTECTED]> wrote: > Hi. > > I cannot really think of a reason why Perl itself would do something > different in either case. And in your tests, it was verified that > PERL_UNICODE itself is still set right under mod_perl. So it must be > that mod_perl somehow overrides the basic Perl setting. Maybe mod_perl > needs to do something re the filehandles, because some of them might be > connected to Apache ? > > Anyhow, out of my depth now, so let's call on a real mod_perl guru if > any of them is around ? > > By the way : > I have tried the same thing in the meantime under Apache 2.x/mod_perl > 2.x, and I seem to have the same problem. > > I have one more question : where exactly do you set PERL_UNICODE ? > > > > > > Rob French wrote: > > Hi André, > > > > Yes, I tried that as well and it worked as expected (UTF-8 flag is > > set). Explicit PerlIO layer decoding works in both the non-mod_perl > > and mod_perl tests. It seems only the default PERL_UNICODE setting is > > ignored in mod_perl even though it is set. > > > > Rgrds, > > Rob > > > > On Wed, Mar 19, 2008 at 3:01 AM, André Warnier <[EMAIL PROTECTED]> wrote: > >> Hi. > >> > >> Perl's handling of Unicode (and of character sets in general) is > >> extremely clever and powerful. > >> But it can sometimes be a bit counter-intuitive. > >> > >> In any case, it seems to me that the evaluation of the PERL_UNICODE > >> environment variable is a "Perl thing" rather than a "mod_perl thing", > >> and that mod_perl per se should not interfere with it. But maybe > >> mod_perl does some magic on filehandles in general which interferes, who > >> knows ? > >> > >> Maybe the first thing to do is to ascertain that the problem is really > >> due to a mishandling of the PERL_UNICODE environment variable, or > >> something else. I propose a simple test : > >> Instead of relying on the PERL_UNICODE variable, what happens when you > >> change the open() statement as follows : > >> > >> > open(FH, '<:utf8',"/tmp/utf8.txt"); > >> > >> thus explicitly setting a UTF-8 decoding layer for the stream FH, > >> instead of relying on PERL_UNICODE. > >> Does your follow-up test then indicate that the utf8 flag for $var is > set ? > >> > >> Note : even with the decoding layer set, that does not necessarily mean > >> that all data you read will end up with the utf8 flag set. It depends > >> on the data. But in your case, if you are really using the same file > >> data in both tests you show below, then it seems a valid test. > >> > >> André > >> > >> > >> > >> > >> Rob French wrote: > >> > I have recently started converting one of our webapps to make it fully > >> > UTF-8 compliant. All input/output from the webapp will be encoded as > >> > UTF-8. As such, I am trying to use the PERL_UNICODE env variable to > >> > enable UTF-8 flagging on all input/output streams. This works with > >> > standalone Perl scripts like the one below (the /tmp/utf8.txt file > >> > contains a single character (U+00E6 - LATIN SMALL LETTER Ae) : > >> > > >> > #!/usr/bin/perl -w > >> > > >> > use strict; > >> > use Encode; > >> > > >> > print "PERL_UNICODE Value: ${^UNICODE}\n"; > >> > open(FH, "</tmp/utf8.txt"); > >> > undef $/; > >> > my $var = <FH>; > >> > close(FH); > >> > > >> > print "Flagged as UTF8? " . Encode::is_utf8($var) . "\n"; > >> > exit; > >> > > >> > The resulting output after setting my PERL_UNICODE env var to SDA is: > >> > > >> > PERL_UNICODE Value: 63 > >> > Flagged as UTF8? 1 > >> > > >> > Which is correct. Perl processed the input stream (open) as UTF-8 and > >> > flagged it accordingly. > >> > > >> > Unfortunately if I put the exact same open call in my mod_perl > >> > TransHandler $var is not flagged as UTF-8. The resulting output when > >> > run in the TransHandler is: > >> > > >> > PERL_UNICODE Value: 63 > >> > Flagged as UTF8? > >> > > >> > The input stream is not processed as UTF-8 and not flagged internally > >> > as UTF-8. If I explicitly add an Encode::decode_utf8($var) in mod_perl > >> > then everything works as expected. It appears as if mod_perl is > >> > ignoring the PERL_UNICODE env variable and not processing my input > >> > streams as UTF-8. > >> > > >> > Thanks in advance. > >> > > >> > Cheers > >> > > >> > > >> > > >> > > >> > Environment details below: > >> > > >> > Summary of my perl5 (revision 5 version 8 subversion 5) configuration: > >> > Platform: > >> > osname=linux, osvers=2.6.9-22.18.bz155725.elsmp, > >> > archname=i386-linux-thread-multi > >> > uname='linux hs20-bc1-4.build.redhat.com > >> > 2.6.9-22.18.bz155725.elsmp #1 smp thu nov 17 15:34:08 est 2005 i686 > >> > i686 i386 gnulinux ' > >> > config_args='-des -Doptimize=-O2 -g -pipe -m32 -march=i386 > >> > -mtune=pentium4 -Dversion=5.8.5 -Dmyhostname=localhost > >> > [EMAIL PROTECTED] -Dcc=gcc -Dcf_by=Red Hat, Inc. > >> > -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux > >> > -Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads > >> > -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db > >> > -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio > >> > -Dinstallusrbinperl -Ubincompat5005 -Uversiononly > >> > -Dpager=/usr/bin/less -isr -Dinc_version_list=5.8.4 5.8.3 5.8.2 5.8.1 > >> > 5.8.0' > >> > hint=recommended, useposix=true, d_sigaction=define > >> > usethreads=define use5005threads=undef useithreads=define > >> > usemultiplicity=define > >> > useperlio=define d_sfio=undef uselargefiles=define usesocks=undef > >> > use64bitint=undef use64bitall=undef uselongdouble=undef > >> > usemymalloc=n, bincompat5005=undef > >> > Compiler: > >> > cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING > >> > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE > >> > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm', > >> > optimize='-O2 -g -pipe -m32 -march=i386 -mtune=pentium4', > >> > cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING > >> > -fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm' > >> > ccversion='', gccversion='3.4.6 20060404 (Red Hat 3.4.6-2)', > gccosandvers='' > >> > intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 > >> > d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 > >> > ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', > >> > lseeksize=8 > >> > alignbytes=4, prototype=define > >> > Linker and Libraries: > >> > ld='gcc', ldflags =' -L/usr/local/lib' > >> > libpth=/usr/local/lib /lib /usr/lib > >> > libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread > -lc > >> > perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc > >> > libc=/lib/libc-2.3.4.so, so=so, useshrplib=true, libperl=libperl.so > >> > gnulibc_version='2.3.4' > >> > Dynamic Linking: > >> > dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E > >> > -Wl,-rpath,/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE' > >> > cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib' > >> > > >> > > >> > Characteristics of this binary (from libperl): > >> > Compile-time options: DEBUGGING MULTIPLICITY USE_ITHREADS > >> > USE_LARGE_FILES PERL_IMPLICIT_CONTEXT > >> > Built under linux > >> > Compiled at Jul 24 2006 18:28:10 > >> > @INC: > >> > /usr/lib/perl5/5.8.5/i386-linux-thread-multi > >> > /usr/lib/perl5/5.8.5 > >> > /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi > >> > /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi > >> > /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi > >> > /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi > >> > /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi > >> > /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi > >> > /usr/lib/perl5/site_perl/5.8.5 > >> > /usr/lib/perl5/site_perl/5.8.4 > >> > /usr/lib/perl5/site_perl/5.8.3 > >> > /usr/lib/perl5/site_perl/5.8.2 > >> > /usr/lib/perl5/site_perl/5.8.1 > >> > /usr/lib/perl5/site_perl/5.8.0 > >> > /usr/lib/perl5/site_perl > >> > /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi > >> > /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi > >> > /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi > >> > /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi > >> > /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi > >> > /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi > >> > /usr/lib/perl5/vendor_perl/5.8.5 > >> > /usr/lib/perl5/vendor_perl/5.8.4 > >> > /usr/lib/perl5/vendor_perl/5.8.3 > >> > /usr/lib/perl5/vendor_perl/5.8.2 > >> > /usr/lib/perl5/vendor_perl/5.8.1 > >> > /usr/lib/perl5/vendor_perl/5.8.0 > >> > /usr/lib/perl5/vendor_perl > >> > . > >> > mod_perl version: 1.30 > >> > > >> > > > >