Makes sense to me. Because the encoding is handled in the body itself, it doesn't necessarily help that much to set the encoding in the getResponseBodyAsString method. Also, this kind of means that you can't rely on the getResponseBodyAsString method for all purposes. There needs to be some other layer of a client application that manages encoding.
I still see the use of get...AsString, of course. It could be an inbetween step that is sent to a parser to determine actual encoding, but then you would need to return to the original byte stream anyway to re-string the body. Maybe the documentation should reflect this information. Also, if people start using charset info in the future, it would probably be nice to provide support. It might be that doing body to string conversion should be somewhere else in the API. Any ideas? My first guess would be to have a utility class that can do the correct encoding, from both the header and maybe even parsing the content. However, I don't think I am framiliar enough with the API to say decisivly. I do know that such features might be very useful for some work that I need to do in the near future. I am working one software that needs to interact with several languages with non-latin character sets. - Rapheal Kaplan On Wednesday 20 March 2002 14:27, you wrote: > I've had to deal with this problem myself. Right now the only solution is > to use getResponseBody() and convert bytes into a string using the > appropriate encoding. I like the idea of having getResponseBodyAsString() > use the encoding specified in the Content-Type header, but the problem is > that it still won't be very useful. > > The vast majority of web servers out there don't include a "; charset=" > attribute in the content-type header or provide a reasonable mechanism for > content authors to cause the server to set the attribute correctly on a > per-file basis. Most pages with non-ISO-LATIN-1 charsets use <META > HTTP-EQUIV> tag in the HTML header to specify the page encoding. That > means you still have to read at least part of the response body (as > ISO-LATIN-1) in order to determine the correct encoding. > > I don't have a problem with changing getResponseBodyAsString() to check the > content-type header, I just doubt that doing that will make it much more > useful in the real world. > > What do others think? > > Marc Saegesser > -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
