In <[EMAIL PROTECTED]> on 11 May 2004, Louise M. Mitchell <Mitchell> wrote: > I need to grab the encoding of pages I'm retrieving with > LWP::UserAgent... my perusal of the documentation indicated I could use > the LWP::MediaTypes to get the encoding...
No, that's for guessing when other information is not present. Look for charset info in $response->header('Content-Type'). If charset info is not present there, then the HTTP specs say the charset defaults to ISO-8859-1; but the HTML 4.01 spec says the charset doesn't default to anything in that case. The charset information in the Content-Type header has precedence over other possible sources of charset info such as an XML declaration or <meta> tag. If you wish to examine the meta tags, you should use one of the HTML parsers to parse the response content. (Finally, if you care, there is also a spec on how to guess the charset of documents where none has been specified. The procedure is basically to go through the response incrementally seeing what charsets could legally contain all the content encountered so far until only one remains or the end of the content has been reached.) -ccwf -- Charles C. Fu ,-- Founder ___ __ __. . ,-/-- Web i18n, LLC (_,(_,|/|/ / www.web-i18n.net ----'