apreq validates anything it presents as utf8, otherwise it marks it as ISO88591 
or some windows encoding I don't remember the name of if that fails.



On Monday, September 8, 2014 3:17 PM, André Warnier <a...@ice-sa.com> wrote:
 


Michael Schout wrote:

> On 9/2/14, 4:19 PM, Randal L. Schwartz wrote:
> 
>>   ## ensure utf8 CGI params:
>>   $CGI::PARAM_UTF8 = 1;
> 
> Sorry to chime in late on this, but part of the problem with CGI.pm and
> UTF-8 is that PARAM_UTF8 gets clobbered by a cleanup handler that CGI.pm
> itself registers if its running under mod_perl.
> 
> This caused major headaches for me at one time until I figured this out.
> 
> You have to make sure to set $CGI::PARAM_UTF8 early, and FOR EVERY
> REQUEST, because if you just set it globally (e.g.: in a startup perl
> script), then it only works for the first request.
> 

Hi.
Just an addendum to the discussion :

There are really two distinct approaches to this issue, and they work at 
different levels :

1) is to "fix" CGI.pm so that it delivers the parameters in the way which you 
expect.
As shown by the previous valuable and technical contributions, this generally 
works, but 
it requires a certain level of expertise; and it does not necessarily work 
backwards with 
all versions of mod_perl and CGI.pm.

2) is to take whatever CGI.pm does deliver to the calling script or module, and 
use a 
couple of tricks and some additional code in ditto script or module, to ensure 
that 
whatever CGI.pm delivers under whatever mod_perl version, the receiving script 
or module 
always knows in the end what it is dealing with.
That is the method which I presented early in the discussion.
As stated in that contribution, it is not necessarily the most elegant or 
efficient way to 
deal with the issue, but it has the advantage of working always, no matter 
which version 
of CGI.pm and/or mod_perl are in use.

The real crux of the matter is this, in my view : as things stand today in 
terms of 
protocol and RFCs, there is no real way for CGI.pm (or any comparable 
framework) to be 
*sure* of the encoding of the data sent by a browser or another HTTP client 
agent.  Even 
the RFCs do not really provide a way by which this can be enforced. (*)

So if you are sure of what the client is sending, and the matter consists of 
*forcing* 
CGI.pm to always communicate POST (or GET) data as UTF-8 encoded and 
utf8-marked (or the 
opposite) to the calling script/module, then method 1 will work, and it is more 
elegant 
and probably more efficient than method 2.

But if the matter consists of ensuring that the receiving code in the 
script/module which 
  handles the data submitted by the HTTP client, is resilient and "does the 
right thing" 
whatever the submitted data really was, then in my opinion method 2 is better.
(But that's only my opinion of the moment, and I stand ready to be corrected).

(*) and if you believe this not to be true, please send me some references 
about it, 
because I am really interested. It might save me some code in all my web-facing 
applications.

Reply via email to