>>>>> "Randal" == Randal L Schwartz <mer...@stonehenge.com> writes:
Randal> Getting really frustrated with mod_perl2's apparent inability to
Randal> probably read UTF8 input.

Randal> Here's my mod_perl2 setup:

Randal>   Apache 2.2.[something]
Randal>   mod_perl 2.0.7 (or nearly that)
Randal>   ModPerl::Registry
Randal>   Perl "script" with CGI.pm

Randal> Very early in my app:

Randal>   ## ensure utf8 CGI params:
Randal>   $CGI::PARAM_UTF8 = 1;

Randal>   binmode STDIN, ":utf8";
Randal>   binmode STDOUT, ":utf8";
Randal>   binmode STDERR, ":utf8";

Randal> This works fine in CGI mode: when I ask for $foo = $cgi->param('foo'),
Randal> DBI::data_string_desc($foo) shows a UTF8 string with the proper
Randal> discrepency between bytes and chars.

Randal> But when I try to run it under mod_perl, the returned string appears
Randal> to be the raw ascii bytes, and definitely not utf8.  Of course, when I
Randal> store that in the database (using DBD::Pg), the "latin-1" is encoded
Randal> to "utf-8", and I get a bunch of weird chars on the output.

Randal> Has anyone managed to round-trip UTF8 from form to database and back
Randal> using a setup similar to this?

Randal> I suspect part of the problem is this in CGI.pm:

Randal>     'read_from_client' => <<'END_OF_FUNC',
Randal>     # Read data from a file handle
Randal>     sub read_from_client {
Randal>     my($self, $buff, $len, $offset) = @_;
Randal>     local $^W=0;                # prevent a warning
Randal>     return $MOD_PERL
Randal>         ? $self->r->read($$buff, $len, $offset)
Randal>             : read(\*STDIN, $$buff, $len, $offset);
Randal>     }
Randal>     END_OF_FUNC

Randal> Since I binmode STDIN, the non-$MOD_PERL works ok here.  What's the
Randal> equivalent of $r->read() that marks the incoming stream as UTF8, so I
Randal> get chars instead of bytes?  Or can I just read(\*STDIN) in mod_perl2
Randal> as well? (I know that was supported at one point...)

I realized that I never posted my ultimate solution.  I monkey patch
CGI.pm:

require CGI;
{
  my $orig = \&CGI::param;
  no warnings 'redefine';
  *CGI::param = sub {
    $CGI::LIST_CONTEXT_WARN = 0; # workaround for backward compatibility
    $CGI::PARAM_UTF8 = 1;
    goto &$orig;
  };
}

And this has been working just fine for both CGI and mod_perl.  Just for the
record.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<mer...@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix consulting, Technical writing, Comedy, etc. etc.
Still trying to think of something clever for the fourth line of this .sig

Reply via email to