Paul Kulchenko <[EMAIL PROTECTED]> writes:
> --- Gisle Aas <[EMAIL PROTECTED]> wrote:
> > What you want is LWP to deal with strings with the _internal_ UTF8
> > flag set. In my view LWP can't guess what encoding to apply to
> Quite opposite. I want that LWP deals with string as binary strings
> regardless of used encoding.
There is no obvious way to deal with strings containing chars outside
0..255 as _binary strings_.
> length() used there deals with string as
> set of chars instead of bytes, thus making impossible for LWP to
> specify proper content-length and call sysread/syswrite with proper
Correct. The best fix is to not provide such strings. syswrite()
refuse to deal with them too.
> > serialize that kind of string. UTF-8 is not really a more obvious
> > choice than UTF-16. If it happens to be an image that somehow got
> > the
> > UTF8 flag set then any UTF-encoding would be wrong, as the string
> > should simply be utf8_downgraded to be ok again.
> Absolutely. That's exactly what should be done imho. String should be
> downgraded to set of bytes.
But only when all chars are 0..255.
> > > What do you expect me to do?
> > If you want the string UTF8 encoded, then say so explicitly:
> > $req = HTTP::Request->new(POST => $endpoint);
> > $req->content_type("text/plain; charset='utf8'");
> > $req->content(Encode::encode_utf8($utf8));
> But that's precisely what I do. Content is being specified
> incorrectly because length() calculates chars on utf8-encoded
> strings. That's where we started.
You did not have the Encode::encode_utf8() call. If you do everything
should be fine, and all chars in the content will be in range 0..255.