Re: Charset in response

André Warnier Thu, 29 Nov 2012 04:19:55 -0800

Addendum at end.

André Warnier wrote:

Hi.
I have a problem with a PerlResponseHandler, regarding the character setused in the response to a request.Basically, the question is : how to I set the character set properly forthe "handle" used in
$r->print("string") ?
(where string can be "äéèöü" for example)
Neither of the following (which I do before starting to print output)seems to work :
$r->headers_out->unset('content-type');
$r->headers_out->set('content-type','text/html;charset=xxxx');

or

$r->content_type('text/html;charset=xxxx');

When I say that it doesn't work, I mean in fact :
- the "Content-Type" response header sent by the server is properly setaccording to what I do above (as verified in a browser plugin)- but if what I print contains "accented" characters, they are not beingencoded properly
So, do I need to set something else so that the $r->print(string) willoutput "string" properly ?
Background :
My PerlResponseHandler reads a html file from disk, replaces somestrings into it, and sends the result out via $r->print.The source html file can be encoded in iso-8859-1 or UTF-8, and itcontains a proper declaration of the charset under which it is reallyencoded :
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
or
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
To read the file, I first open it "raw", read a few lines, checking forthe above <meta> tag. If found, I note the charset (say in $charset),close the file, and re-open it as
open(my $fh,"<:encoding($charset)", $file);

(note : if $charset is "UTF-8", then the open becomes
open(my $fh,'<:utf8', $file);)

I also at that point set the response charset by one of the means above.
Then I read the file line by line, substituting some strings in theline, and print out the line via
$r->print($line);
etc..
My problem is that, if the input file is for example iso-8859-1 andcontains the word "Männer", the output comes out as "M(A tilde)(somebyte)nner" (the bytes corresponding to the UTF-8 encoding of the "aumlaut").
Can I / should I do something like
binmode($r,":$charset"); # ??

TIA


Addendum : I added some logging to the ResponseHandler as follows :

PARAM: while (defined($line = <$form_fh>)) {

        if ($Debug > 1) {

$r->log->warn(" input line is [$line], utf8 flag : " . (Encode::is_utf8($line) ? "y" :"n"));

        }

The corresponding line in the log, for a line containing the word "männlich", 
is :

[Thu Nov 29 10:34:37 2012] [warn] [client 192.168.245.129] input line is [\t\t\t\t<inputname="ANSPR" type="radio" value="m" id="ANSPR"> m\xc3\xa4nnlich\n], utf8 flag : y

Of course, as is usual in the type of case, one never knows how the logfile itself iswritten..But it does confirm that, as read in the Handler, the string is properly encodedinternally in perl, with the utf8 flag set.

However, when I look in the result as received by the browser,
- the browser says that the page received is encoded as iso-8859-1

- the browser's "view page source" confirms that this character is (incorrectly)represented by 2 bytes :

        <input name="ANSPR" type="radio" value="m" id="ANSPR">&nbsp;mÃ¤nnlich

Re: Charset in response

Reply via email to