On Mon, 2007-06-11 at 18:38 +0800, Feng Jiang wrote:
> I agree with you. But I do find a lot of servers act in that way. Some
> Location headers are in GBK, and some are in UTF8. The only thing I can do
> is to hack in the code.
> 
> I think httpclient should provide a mechanism to handle it. Httpclient has a
> dummy detectror to detect the charset of url, which always return
> "US-ASCII". But it allows user to override it.
> 

The trouble is I do not know of a reliable way to detect any arbitrary
charset of a request URI. If you do, we'll happily accept such a
contribution. Maybe one can tell GBK from UTF-8 but certainly not
Win1251 from KOI8-R.

The whole point of having standards is that parties involved agree on
some common conventions and they actually stick to them. 

Oleg


> Feng
> 
> On 6/11/07, Oleg Kalnichevski <[EMAIL PROTECTED]> wrote:
> >
> > On Mon, 2007-06-11 at 17:27 +0800, Feng Jiang wrote:
> > > Hi all,
> > >
> > > I think the implementation of HttpMethodParams#getHttpElementCharset()
> > has a
> > > problem. In default, httpclient will choose US-ASCII as the charset to
> > > decode the http element, such as some headers.
> > >
> > > But I do meet some servers from which the LOCATION header is in some
> > other
> > > charset, such UTF8, so that the httpclient  cannot handles the
> > > redirection(in my application, i handle it by myself) correctly. For
> > > example, one server response such  a header:
> > >
> > > Location: http://www.abc.com/****(some chinese character)/hello/world
> > >
> > > The above url contains some Chinese characters in some other charset,
> > such
> > > as GBK. The right way of httpclient should be: 1. detect the charset of
> > the
> > > url. 2. decode the url in that correct charset to a java.lang.String. 3.
> > > construct correct header instance.
> > >
> > > Am I right?
> > >
> >
> > Not really. The use of non-ASCII characters in HTTP head elements (such
> > as headers or a request line) is a violation of the HTTP specification.
> > You can explicitly override the standard charset with a non-standard one
> > such as UTF-8 or GBK by setting the 'http.protocol.element-charset'
> > parameter, but I do not think HttpClient should attempt to 'guess' the
> > charset being used.
> >
> > For details see:
> >
> > http://jakarta.apache.org/commons/httpclient/charencodings.html
> >
> > Oleg
> >
> > > Feng
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to