RE: Character Encodings

Kalnichevski, Oleg Fri, 07 Mar 2003 02:26:22 -0800

Hi Adrian

> 1. URLs should only consist of ISO-8859-1 characters whenever possible as
> this is the encoding used by RFC 1738 and using other encodings may cause
> compatibility issues with some servers (eg: Windows Web Folders).  This is
> mostly due to the fact that there is no way to determine the encoding used
> for the URL.


I believe it is correct. However, you may need to consult with Sung-Gu on this matter

> 2. The headers of a HTTP request/response must always be ISO 8859-1 (or is
> this ASCII?) as per the HTTP standard.

Header elements (status line + headers) must be in US-ASCII according to the HTTP spec


> 3. The Content-Type: header may specify a charset for the body of the HTTP
> request/response, eg: Content-Type: text/html; charset=UTF-8

Correct

> 4. Is there any simple way to extract the charset returned by the server
> from HttpClient?  If not we probably should add one.  Obviously you could
> get the Content-Type header and parse it but since HttpClient already does
> this (I think) it would be better to avoid it.

HttpMethodBase#getResponseCharSet()
HttpMethodBase#getRequestCharSet()

> 5. getResponseBodyAsString always uses the platform default encoding.  Why
> doesn't this use the charset specified in the HTTP request?

HttpMethodBase#getResponseBodyAsString() does use the charset specified in the 
response 'Content-Type' header (when available)

> 6. Some document types specify the charset inside the document itself, you
> should consult the appropriate standards to determine whether to use the
> charset specified in the HTTP response or the charset in the document.

HttpClient is not supposed to be aware of any body content specific stuff. It is 
HttpClient consumer's responsibility to ensure that the content is properly decoded

Cheers

Oleg

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Character Encodings

Reply via email to