Addendum at end.

André Warnier wrote:
Hi.

I have a problem with a PerlResponseHandler, regarding the character set used in the response to a request. Basically, the question is : how to I set the character set properly for the "handle" used in
$r->print("string") ?
(where string can be "äéèöü" for example)

Neither of the following (which I do before starting to print output) seems to work :

$r->headers_out->unset('content-type');
$r->headers_out->set('content-type','text/html;charset=xxxx');

or

$r->content_type('text/html;charset=xxxx');

When I say that it doesn't work, I mean in fact :
- the "Content-Type" response header sent by the server is properly set according to what I do above (as verified in a browser plugin) - but if what I print contains "accented" characters, they are not being encoded properly

So, do I need to set something else so that the $r->print(string) will output "string" properly ?


Background :

My PerlResponseHandler reads a html file from disk, replaces some strings into it, and sends the result out via $r->print. The source html file can be encoded in iso-8859-1 or UTF-8, and it contains a proper declaration of the charset under which it is really encoded :

<meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
or
<meta http-equiv="content-type" content="text/html; charset=UTF-8">

To read the file, I first open it "raw", read a few lines, checking for the above <meta> tag. If found, I note the charset (say in $charset), close the file, and re-open it as

open(my $fh,"<:encoding($charset)", $file);

(note : if $charset is "UTF-8", then the open becomes
open(my $fh,'<:utf8', $file);)

I also at that point set the response charset by one of the means above.

Then I read the file line by line, substituting some strings in the line, and print out the line via
$r->print($line);
etc..

My problem is that, if the input file is for example iso-8859-1 and contains the word "Männer", the output comes out as "M(A tilde)(some byte)nner" (the bytes corresponding to the UTF-8 encoding of the "a umlaut").

Can I / should I do something like
binmode($r,":$charset"); # ??

TIA



Addendum : I added some logging to the ResponseHandler as follows :

PARAM: while (defined($line = <$form_fh>)) {

        if ($Debug > 1) {
$r->log->warn(" input line is [$line], utf8 flag : " . (Encode::is_utf8($line) ? "y" : "n"));
        }

The corresponding line in the log, for a line containing the word "männlich", 
is :

[Thu Nov 29 10:34:37 2012] [warn] [client 192.168.245.129] input line is [\t\t\t\t<input name="ANSPR" type="radio" value="m" id="ANSPR">&nbsp;m\xc3\xa4nnlich\n], utf8 flag : y

Of course, as is usual in the type of case, one never knows how the logfile itself is written.. But it does confirm that, as read in the Handler, the string is properly encoded internally in perl, with the utf8 flag set.
However, when I look in the result as received by the browser,
- the browser says that the page received is encoded as iso-8859-1
- the browser's "view page source" confirms that this character is (incorrectly) represented by 2 bytes :
        <input name="ANSPR" type="radio" value="m" id="ANSPR">&nbsp;männlich

Reply via email to