Hi André,
Yes, I tried that as well and it worked as expected (UTF-8 flag is
set). Explicit PerlIO layer decoding works in both the non-mod_perl
and mod_perl tests. It seems only the default PERL_UNICODE setting is
ignored in mod_perl even though it is set.
Rgrds,
Rob
On Wed, Mar 19, 2008 at 3:01 AM, André Warnier <[EMAIL PROTECTED]> wrote:
> Hi.
>
> Perl's handling of Unicode (and of character sets in general) is
> extremely clever and powerful.
> But it can sometimes be a bit counter-intuitive.
>
> In any case, it seems to me that the evaluation of the PERL_UNICODE
> environment variable is a "Perl thing" rather than a "mod_perl thing",
> and that mod_perl per se should not interfere with it. But maybe
> mod_perl does some magic on filehandles in general which interferes, who
> knows ?
>
> Maybe the first thing to do is to ascertain that the problem is really
> due to a mishandling of the PERL_UNICODE environment variable, or
> something else. I propose a simple test :
> Instead of relying on the PERL_UNICODE variable, what happens when you
> change the open() statement as follows :
>
> > open(FH, '<:utf8',"/tmp/utf8.txt");
>
> thus explicitly setting a UTF-8 decoding layer for the stream FH,
> instead of relying on PERL_UNICODE.
> Does your follow-up test then indicate that the utf8 flag for $var is set ?
>
> Note : even with the decoding layer set, that does not necessarily mean
> that all data you read will end up with the utf8 flag set. It depends
> on the data. But in your case, if you are really using the same file
> data in both tests you show below, then it seems a valid test.
>
> André
>
>
>
>
> Rob French wrote:
> > I have recently started converting one of our webapps to make it fully
> > UTF-8 compliant. All input/output from the webapp will be encoded as
> > UTF-8. As such, I am trying to use the PERL_UNICODE env variable to
> > enable UTF-8 flagging on all input/output streams. This works with
> > standalone Perl scripts like the one below (the /tmp/utf8.txt file
> > contains a single character (U+00E6 - LATIN SMALL LETTER Ae) :
> >
> > #!/usr/bin/perl -w
> >
> > use strict;
> > use Encode;
> >
> > print "PERL_UNICODE Value: ${^UNICODE}\n";
> > open(FH, "</tmp/utf8.txt");
> > undef $/;
> > my $var = <FH>;
> > close(FH);
> >
> > print "Flagged as UTF8? " . Encode::is_utf8($var) . "\n";
> > exit;
> >
> > The resulting output after setting my PERL_UNICODE env var to SDA is:
> >
> > PERL_UNICODE Value: 63
> > Flagged as UTF8? 1
> >
> > Which is correct. Perl processed the input stream (open) as UTF-8 and
> > flagged it accordingly.
> >
> > Unfortunately if I put the exact same open call in my mod_perl
> > TransHandler $var is not flagged as UTF-8. The resulting output when
> > run in the TransHandler is:
> >
> > PERL_UNICODE Value: 63
> > Flagged as UTF8?
> >
> > The input stream is not processed as UTF-8 and not flagged internally
> > as UTF-8. If I explicitly add an Encode::decode_utf8($var) in mod_perl
> > then everything works as expected. It appears as if mod_perl is
> > ignoring the PERL_UNICODE env variable and not processing my input
> > streams as UTF-8.
> >
> > Thanks in advance.
> >
> > Cheers
> >
> >
> >
> >
> > Environment details below:
> >
> > Summary of my perl5 (revision 5 version 8 subversion 5) configuration:
> > Platform:
> > osname=linux, osvers=2.6.9-22.18.bz155725.elsmp,
> > archname=i386-linux-thread-multi
> > uname='linux hs20-bc1-4.build.redhat.com
> > 2.6.9-22.18.bz155725.elsmp #1 smp thu nov 17 15:34:08 est 2005 i686
> > i686 i386 gnulinux '
> > config_args='-des -Doptimize=-O2 -g -pipe -m32 -march=i386
> > -mtune=pentium4 -Dversion=5.8.5 -Dmyhostname=localhost
> > [EMAIL PROTECTED] -Dcc=gcc -Dcf_by=Red Hat, Inc.
> > -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux
> > -Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads
> > -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db
> > -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio
> > -Dinstallusrbinperl -Ubincompat5005 -Uversiononly
> > -Dpager=/usr/bin/less -isr -Dinc_version_list=5.8.4 5.8.3 5.8.2 5.8.1
> > 5.8.0'
> > hint=recommended, useposix=true, d_sigaction=define
> > usethreads=define use5005threads=undef useithreads=define
> > usemultiplicity=define
> > useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
> > use64bitint=undef use64bitall=undef uselongdouble=undef
> > usemymalloc=n, bincompat5005=undef
> > Compiler:
> > cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
> > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
> > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
> > optimize='-O2 -g -pipe -m32 -march=i386 -mtune=pentium4',
> > cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
> > -fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'
> > ccversion='', gccversion='3.4.6 20060404 (Red Hat 3.4.6-2)',
> gccosandvers=''
> > intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
> > d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
> > ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
> > lseeksize=8
> > alignbytes=4, prototype=define
> > Linker and Libraries:
> > ld='gcc', ldflags =' -L/usr/local/lib'
> > libpth=/usr/local/lib /lib /usr/lib
> > libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
> > perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
> > libc=/lib/libc-2.3.4.so, so=so, useshrplib=true, libperl=libperl.so
> > gnulibc_version='2.3.4'
> > Dynamic Linking:
> > dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E
> > -Wl,-rpath,/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE'
> > cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'
> >
> >
> > Characteristics of this binary (from libperl):
> > Compile-time options: DEBUGGING MULTIPLICITY USE_ITHREADS
> > USE_LARGE_FILES PERL_IMPLICIT_CONTEXT
> > Built under linux
> > Compiled at Jul 24 2006 18:28:10
> > @INC:
> > /usr/lib/perl5/5.8.5/i386-linux-thread-multi
> > /usr/lib/perl5/5.8.5
> > /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
> > /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi
> > /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
> > /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi
> > /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi
> > /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
> > /usr/lib/perl5/site_perl/5.8.5
> > /usr/lib/perl5/site_perl/5.8.4
> > /usr/lib/perl5/site_perl/5.8.3
> > /usr/lib/perl5/site_perl/5.8.2
> > /usr/lib/perl5/site_perl/5.8.1
> > /usr/lib/perl5/site_perl/5.8.0
> > /usr/lib/perl5/site_perl
> > /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi
> > /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi
> > /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi
> > /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi
> > /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi
> > /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
> > /usr/lib/perl5/vendor_perl/5.8.5
> > /usr/lib/perl5/vendor_perl/5.8.4
> > /usr/lib/perl5/vendor_perl/5.8.3
> > /usr/lib/perl5/vendor_perl/5.8.2
> > /usr/lib/perl5/vendor_perl/5.8.1
> > /usr/lib/perl5/vendor_perl/5.8.0
> > /usr/lib/perl5/vendor_perl
> > .
> > mod_perl version: 1.30
> >
>