Re: [HttpClient]Encoding

Rapheal Kaplan Wed, 20 Mar 2002 11:42:17 -0800

  Makes sense to me.  Because the encoding is handled in the body itself, it 
doesn't necessarily help that much to set the encoding in the 
getResponseBodyAsString method.  Also, this kind of means that you can't rely 
on the getResponseBodyAsString method for all purposes.  There needs to be 
some other layer of a client application that manages encoding.

  I still see the use of get...AsString, of course.  It could be an inbetween 
step that is sent to a parser to determine actual encoding, but then you 
would need to return to the original byte stream anyway to re-string the 
body.  Maybe the documentation should reflect this information.

  Also, if people start using charset info in the future, it would probably 
be nice to provide support.  It might be that doing body to string conversion 
should be somewhere else in the API.  Any ideas?

  My first guess would be to have a utility class that can do the correct 
encoding, from both the header and maybe even parsing the content.  However, 
I don't think I am framiliar enough with the API to say decisivly.

  I do know that such features might be very useful for some work 
that I need to do in the near future.  I am working one software that needs 
to interact with several languages with non-latin character sets.

  - Rapheal Kaplan

On Wednesday 20 March 2002 14:27, you wrote:
> I've had to deal with this problem myself.  Right now the only solution is
> to use getResponseBody() and convert bytes into a string using the
> appropriate encoding.  I like the idea of having getResponseBodyAsString()
> use the encoding specified in the Content-Type header, but the problem is
> that it still won't be very useful.
>
> The vast majority of web servers out there don't include a "; charset="
> attribute in the content-type header or provide a reasonable mechanism for
> content authors to cause the server to set the attribute correctly on a
> per-file basis.  Most pages with non-ISO-LATIN-1 charsets use <META
> HTTP-EQUIV> tag in the HTML header to specify the page encoding.  That
> means you still have to read at least part of the response body (as
> ISO-LATIN-1) in order to determine the correct encoding.
>
> I don't have a problem with changing getResponseBodyAsString() to check the
> content-type header, I just doubt that doing that will make it much more
> useful in the real world.
>
> What do others think?
>
> Marc Saegesser
>

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Re: [HttpClient]Encoding

Reply via email to