On Tue, Dec 8, 2009 at 12:26 AM, Aristotle Pagaltzis <[email protected]>wrote:
>
> There is no such thing as an octet stream in Perl. There are only
> strings, and strings are sequences of arbitrarily large integers.
>
Help me out here.
What I've stuck in my mind is that the poorly-named utf8 flag on Perl
strings is really the "is_character_data" flag. To get get character data
it *must* be decoded on input, and the act of decoding sets that flag. Even
decoding 8 bit character encoding will set the flag.
$ perl -MEncode -wle '$x=Encode::decode("ASCII", "hello"); print
Encode::is_utf8( $x ) ? "flag set\n" : "no flag\n";'
flag set
$ perl -MEncode -wle '$x=Encode::decode("iso-8859-1", "hello"); print
Encode::is_utf8( $x ) ? "flag set\n" : "no flag\n";'
flag set
And any strings with the flag set *must* be encoded before printing (sending
out of Perl) -- otherwise you are printing abstract "characters" that have
no meaning outside of Perl.
Plus, content_length must be the encoded length. Therefore, it's impossible
to set the content length on character data unless you encode it first.
So the code seems like it must be:
die "no clue how long the body is because it's still characters" if
Encode::is_utf8( $response->body );
$response->content_length( length( $response->body ) );
That's not very friendly, of course. But, what other choice is there?
The correct thing would be to force all responses to have a defined content
type and then encode the characters at the end of the request (right before
setting content length).
--
Bill Moseley
[email protected]
_______________________________________________
List: [email protected]
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/[email protected]/
Dev site: http://dev.catalyst.perl.org/