* David Nesting wrote: >For most uses of libwww, developers do little with character encoding. >Indeed, for general-case use of LWP::Simple, they can't, because that >information isn't even exposed. Has any thought gone into doing this >internally within libwww, so that when I fetch content, I get back text >instead of octets?
Generally speaking, this is rather difficult as some content may not be textual at all, and textual formats vary in how applications are to de- tect the encoding (e.g., XML has different rules than HTML, text/plain has no rules beyond looking at the charset parameter, and so on). If you want a general-purpose solution, a good start would be a module taking a HTTP::Response object and detecting the encoding, possibly decoding it on request. >I'd be happy to help work on some of this, but the fact that I see no >use of character encodings within libwww makes me wonder if this is more >of a policy decision not to do it. There was a bit of a discussion to somehow use HTML::Encoding for some parts of it, which pretty much solves the problem for HTML and XML, cf the list archives. Help on improving HTML::Encoding would be welcome, I have little time to work on it at the moment. -- Björn Höhrmann · mailto:[EMAIL PROTECTED] · http://bjoern.hoehrmann.de Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/