mod_perl and utf8 and CGI->param

Randal L. Schwartz Tue, 02 Sep 2014 14:20:38 -0700

Getting really frustrated with mod_perl2's apparent inability to
probably read UTF8 input.


Here's my mod_perl2 setup:

  Apache 2.2.[something]
  mod_perl 2.0.7 (or nearly that)
  ModPerl::Registry
  Perl "script" with CGI.pm

Very early in my app:

  ## ensure utf8 CGI params:
  $CGI::PARAM_UTF8 = 1;

  binmode STDIN, ":utf8";
  binmode STDOUT, ":utf8";
  binmode STDERR, ":utf8";

This works fine in CGI mode: when I ask for $foo = $cgi->param('foo'),
DBI::data_string_desc($foo) shows a UTF8 string with the proper
discrepency between bytes and chars.

But when I try to run it under mod_perl, the returned string appears
to be the raw ascii bytes, and definitely not utf8.  Of course, when I
store that in the database (using DBD::Pg), the "latin-1" is encoded
to "utf-8", and I get a bunch of weird chars on the output.

Has anyone managed to round-trip UTF8 from form to database and back
using a setup similar to this?

I suspect part of the problem is this in CGI.pm:

    'read_from_client' => <<'END_OF_FUNC',
    # Read data from a file handle
    sub read_from_client {
    my($self, $buff, $len, $offset) = @_;
    local $^W=0;                # prevent a warning
    return $MOD_PERL
        ? $self->r->read($$buff, $len, $offset)
            : read(\*STDIN, $$buff, $len, $offset);
    }
    END_OF_FUNC

Since I binmode STDIN, the non-$MOD_PERL works ok here.  What's the
equivalent of $r->read() that marks the incoming stream as UTF8, so I
get chars instead of bytes?  Or can I just read(\*STDIN) in mod_perl2
as well? (I know that was supported at one point...)



-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<mer...@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix consulting, Technical writing, Comedy, etc. etc.
Still trying to think of something clever for the fourth line of this .sig

mod_perl and utf8 and CGI->param

Reply via email to