* David Nesting wrote:
>The most complete implementation imaginable would start with at least these:
>
>text/html (html-specific rules)
>text/xml (xml-specific rules)
>text/* (general-purpose text rules)
>application/*+xml (xml-specific rules)

HTML::Encoding does all of these, except text/* (for which there are no
rules beyond checking the charset parameter, though you might also try
to check for a Unicode signature at the beginning, which almost always
indicates the Unicode encoding form, HTML::Encoding can do both but is
not designed to do that for arbitrary types).

>On the other hand, I'm less convinced now that dipping into the HTML or XML
>content to figure out the proper encoding is necessarily the proper thing to
>do here.  My complaint about LWP::Simple was that the HTTP Content-Type
>(charset) information is lost by the time it gets to the caller.

Well that is necessarily so to keep the interface simple. Going from
LWP::Simple::get to LWP::UserAgent->new->get(...) is easy enough to not
warrant adding functionality to LWP::Simple.

>I could see a case then for dealing with text/* only and returning octets
>for everything else, since text/* is the only media type that has character
>encoding details in the HTTP headers.

Actually that is not the case, there are plenty of, say, application/*
formats, like the XML types, that carry encoding information in the
header, without replicating it in the content (likewise, information in
the content may not be replicated in the header, and the two may contra-
dict each other).

>Yes, it's still "their fault" for not coding a robust application, but
>helping them do that is I think still a valid goal, if we can do it safely.

Well, automagic decoding of content cannot be added to LWP::Simple with-
out some opt-in switch as that would break a lot of programs, and if you
require some opt-in, you might as well require switching the module.
-- 
Björn Höhrmann · mailto:[EMAIL PROTECTED] · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 

Reply via email to