On Mon, 2007-06-11 at 18:38 +0800, Feng Jiang wrote: > I agree with you. But I do find a lot of servers act in that way. Some > Location headers are in GBK, and some are in UTF8. The only thing I can do > is to hack in the code. > > I think httpclient should provide a mechanism to handle it. Httpclient has a > dummy detectror to detect the charset of url, which always return > "US-ASCII". But it allows user to override it. >
The trouble is I do not know of a reliable way to detect any arbitrary charset of a request URI. If you do, we'll happily accept such a contribution. Maybe one can tell GBK from UTF-8 but certainly not Win1251 from KOI8-R. The whole point of having standards is that parties involved agree on some common conventions and they actually stick to them. Oleg > Feng > > On 6/11/07, Oleg Kalnichevski <[EMAIL PROTECTED]> wrote: > > > > On Mon, 2007-06-11 at 17:27 +0800, Feng Jiang wrote: > > > Hi all, > > > > > > I think the implementation of HttpMethodParams#getHttpElementCharset() > > has a > > > problem. In default, httpclient will choose US-ASCII as the charset to > > > decode the http element, such as some headers. > > > > > > But I do meet some servers from which the LOCATION header is in some > > other > > > charset, such UTF8, so that the httpclient cannot handles the > > > redirection(in my application, i handle it by myself) correctly. For > > > example, one server response such a header: > > > > > > Location: http://www.abc.com/****(some chinese character)/hello/world > > > > > > The above url contains some Chinese characters in some other charset, > > such > > > as GBK. The right way of httpclient should be: 1. detect the charset of > > the > > > url. 2. decode the url in that correct charset to a java.lang.String. 3. > > > construct correct header instance. > > > > > > Am I right? > > > > > > > Not really. The use of non-ASCII characters in HTTP head elements (such > > as headers or a request line) is a violation of the HTTP specification. > > You can explicitly override the standard charset with a non-standard one > > such as UTF-8 or GBK by setting the 'http.protocol.element-charset' > > parameter, but I do not think HttpClient should attempt to 'guess' the > > charset being used. > > > > For details see: > > > > http://jakarta.apache.org/commons/httpclient/charencodings.html > > > > Oleg > > > > > Feng > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
