* David Nesting wrote: >The most complete implementation imaginable would start with at least these: > >text/html (html-specific rules) >text/xml (xml-specific rules) >text/* (general-purpose text rules) >application/*+xml (xml-specific rules)
HTML::Encoding does all of these, except text/* (for which there are no rules beyond checking the charset parameter, though you might also try to check for a Unicode signature at the beginning, which almost always indicates the Unicode encoding form, HTML::Encoding can do both but is not designed to do that for arbitrary types). >On the other hand, I'm less convinced now that dipping into the HTML or XML >content to figure out the proper encoding is necessarily the proper thing to >do here. My complaint about LWP::Simple was that the HTTP Content-Type >(charset) information is lost by the time it gets to the caller. Well that is necessarily so to keep the interface simple. Going from LWP::Simple::get to LWP::UserAgent->new->get(...) is easy enough to not warrant adding functionality to LWP::Simple. >I could see a case then for dealing with text/* only and returning octets >for everything else, since text/* is the only media type that has character >encoding details in the HTTP headers. Actually that is not the case, there are plenty of, say, application/* formats, like the XML types, that carry encoding information in the header, without replicating it in the content (likewise, information in the content may not be replicated in the header, and the two may contra- dict each other). >Yes, it's still "their fault" for not coding a robust application, but >helping them do that is I think still a valid goal, if we can do it safely. Well, automagic decoding of content cannot be added to LWP::Simple with- out some opt-in switch as that would break a lot of programs, and if you require some opt-in, you might as well require switching the module. -- Björn Höhrmann · mailto:[EMAIL PROTECTED] · http://bjoern.hoehrmann.de Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/