Paul Kulchenko <[EMAIL PROTECTED]> writes: > Hi, Gisle! Hi, Paul! > --- Gisle Aas <[EMAIL PROTECTED]> wrote: > > I my view it is a bug to put content containing chars with ord() > 255 > > in the the content of a HTTP::Request. If you want UTF8 encoded stuff > > you should put UTF8 encoded stuff in the content. Don't expect perl > > to magically guess. You should use Encode::encode_utf8($str) or > > something like it. > > I'm not sure I follow you. I do have my string utf8 encoded using > Perl capabilities. What do you mean "is a bug to put content > containing chars with ord() 255 in the the content of a > HTTP::Request"? What should I do then if I have utf8 encoded string > to send? I expect LWP will handle it properly on wire. If you have a utf8 encoded string then none of the chars in the string will have ord() > 255. It is the kind of string that Encode::encode_utf8() would produce. What you want is LWP to deal with strings with the _internal_ UTF8 flag set. In my view LWP can't guess what encoding to apply to serialize that kind of string. UTF-8 is not really a more obvious choice than UTF-16. If it happens to be an image that somehow got the UTF8 flag set then any UTF-encoding would be wrong, as the string should simply be utf8_downgraded to be ok again. > > If there was an easy way I would like to add a > > sv_utf8_downgrade($req->content, 0); > > to the LWP::Protocol code. This would make requests with such > > chars in them fail early. > I don't understand why they should be failed. What's wrong with this: > > $utf8 = pack('U*', unpack('C*', $something_russian_latin1_encoded)); > $req = HTTP::Request > ->new(POST => $endpoint, HTTP::Headers->new, $utf8); > $resp = LWP::UserAgent->new->request($req); > > request won't be properly encoded in 5.6.1 and later unless you drop > utf8 mark from $utf8. I do need to have utf8 encoding on wire. > > What do you expect me to do? If you want the string UTF8 encoded, then say so explicitly: $req = HTTP::Request->new(POST => $endpoint); $req->content_type("text/plain; charset='utf8'"); $req->content(Encode::encode_utf8($utf8)); > > > # drop UTF mark > > > $str = pack('C0A*', $str) if length($str) != bytelength($str); > > > > > > Ideally I would like to have it fixed in LWP::Protocol. btw, how > > > quick is pack 'C0A*'? > > > > It will certainly have to copy the string. I think it would be > > better to try to use one of functions the Encode module provides. > > I need it to work with all Perls starting 5.005. Encode wasn't > available in 5.6.x, was it? That is true. How about something like this (untested); eval { require Encode; }; if ($@) { # replacement *Encode::encode_utf8 = sub { pack('C0A*', shift) }; } Actually, I think pack is buggy if this works. The A* really ought to downgrade the string it packs and croak if this is not possible. Regards, Gisle