Paul Kulchenko <[EMAIL PROTECTED]> writes:

> --- Gisle Aas <[EMAIL PROTECTED]> wrote:
> > What you want is LWP to deal with strings with the _internal_ UTF8
> > flag set.  In my view LWP can't guess what encoding to apply to
> Quite opposite. I want that LWP deals with string as binary strings
> regardless of used encoding.

There is no obvious way to deal with strings containing chars outside
0..255 as _binary strings_.

>                         length() used there deals with string as
> set of chars instead of bytes, thus making impossible for LWP to
> specify proper content-length and call sysread/syswrite with proper
> size.

Correct.  The best fix is to not provide such strings.  syswrite()
refuse to deal with them too.

> > serialize that kind of string.  UTF-8 is not really a more obvious
> > choice than UTF-16.  If it happens to be an image that somehow got
> > the
> > UTF8 flag set then any UTF-encoding would be wrong, as the string
> > should simply be utf8_downgraded to be ok again.
> Absolutely. That's exactly what should be done imho. String should be
> downgraded to set of bytes.

But only when all chars are 0..255.

> > > What do you expect me to do?
> > If you want the string UTF8 encoded, then say so explicitly:
> >  $req = HTTP::Request->new(POST => $endpoint);
> >  $req->content_type("text/plain; charset='utf8'");
> >  $req->content(Encode::encode_utf8($utf8));
> But that's precisely what I do. Content is being specified
> incorrectly because length() calculates chars on utf8-encoded
> strings. That's where we started.

You did not have the Encode::encode_utf8() call.  If you do everything
should be fine, and all chars in the content will be in range 0..255.


Reply via email to