Re: how does httpclient detect element-charset?

Oleg Kalnichevski Mon, 11 Jun 2007 02:47:11 -0700

On Mon, 2007-06-11 at 17:27 +0800, Feng Jiang wrote:
> Hi all,
> 
> I think the implementation of HttpMethodParams#getHttpElementCharset() has a
> problem. In default, httpclient will choose US-ASCII as the charset to
> decode the http element, such as some headers.
> 
> But I do meet some servers from which the LOCATION header is in some other
> charset, such UTF8, so that the httpclient  cannot handles the
> redirection(in my application, i handle it by myself) correctly. For
> example, one server response such  a header:
> 
> Location: http://www.abc.com/****(some chinese character)/hello/world
> 
> The above url contains some Chinese characters in some other charset, such
> as GBK. The right way of httpclient should be: 1. detect the charset of the
> url. 2. decode the url in that correct charset to a java.lang.String. 3.
> construct correct header instance.
> 
> Am I right?
>


Not really. The use of non-ASCII characters in HTTP head elements (such
as headers or a request line) is a violation of the HTTP specification.
You can explicitly override the standard charset with a non-standard one
such as UTF-8 or GBK by setting the 'http.protocol.element-charset'
parameter, but I do not think HttpClient should attempt to 'guess' the
charset being used.

For details see:

http://jakarta.apache.org/commons/httpclient/charencodings.html

Oleg

> Feng


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: how does httpclient detect element-charset?

Reply via email to