* David Nesting wrote:
>For most uses of libwww, developers do little with character encoding.
>Indeed, for general-case use of LWP::Simple, they can't, because that
>information isn't even exposed.  Has any thought gone into doing this
>internally within libwww, so that when I fetch content, I get back text
>instead of octets?

Generally speaking, this is rather difficult as some content may not be
textual at all, and textual formats vary in how applications are to de-
tect the encoding (e.g., XML has different rules than HTML, text/plain
has no rules beyond looking at the charset parameter, and so on). If you
want a general-purpose solution, a good start would be a module taking a
HTTP::Response object and detecting the encoding, possibly decoding it
on request.

>I'd be happy to help work on some of this, but the fact that I see no
>use of character encodings within libwww makes me wonder if this is more
>of a policy decision not to do it.

There was a bit of a discussion to somehow use HTML::Encoding for some
parts of it, which pretty much solves the problem for HTML and XML, cf
the list archives. Help on improving HTML::Encoding would be welcome,
I have little time to work on it at the moment.
-- 
Björn Höhrmann · mailto:[EMAIL PROTECTED] · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 

Reply via email to