Brad,

(1) Please consider upgrading to HttpClient 2.0.2 (or better yet to
HttpClient 3.0). 

If my memory does not fail me, there was a bug in HttpClient 2.0.0 that
resulted in inaccurate logging of Unicode chars. Please note that the
real content sent to the target server IS correctly encoded, it is the
wire logging that gets screwy.

(2) No wonder the input gets truncated, as UTF-8 encoding may use
several bytes to represent a single char. 

So, this is not quite right:

post.setRequestBody(characters);
post.setRequestContentLength(characters.length());
post.setRequestHeader("Content-type", "text/plain; charset=UTF-8");

Try this instead:

String charset = "UTF-8";
byte[] raw = characters.getBytes(charset);
post.setRequestBody(new ByteArrayInputStream(raw));
post.setRequestContentLength(raw.length);
post.setRequestHeader("Content-type", "text/plain; charset=" +
charset);;

Hope this helps

Oleg


On Wed, 2005-03-16 at 16:10 -0500, Brad Hadfield wrote:
> Hello,
> 
> I would greatly appreciate your help.
> 
> I am aware that an earlier post mentions problems with character 
> encodings. I've also read the material on the httpclient site. I can only
> assume that I am doing something incorrectly or my problems are due to a 
> lack of understanding concerning character encoding.
> 
> Specifically we are sending XML in the body of a post.  As an example of 
> the kind of output I am getting I have created a small example.
> 
> The following code segment results in the log out-put below:
> 
>     log.debug("Start.");
>     String characters =
>         "These are the characters - Abcde;+[X�����] - some may fail.";
> 
>     log.debug("Content String: " + characters);
> 
>     URL url = new URL("http://localhost:1234/ppsi/xxx.hxi";);
>     PostMethod post = new PostMethod();
>     post.setRequestBody(characters);
>     post.setRequestContentLength(characters.length());
>     post.setRequestHeader("Content-type", "text/plain; charset=UTF-8");
> 
>     HttpClient cl = new HttpClient();
>     cl.getHostConfiguration().setHost(url.getHost(), url.getPort());
>     cl.executeMethod(post);
> 
>     log.debug("end.");
> 
> 
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> main           | DEBUG test.PPSTest          Start.
> 
> main           | DEBUG test.PPSTest          Content String: These are 
> the characters - Abcde;+[X�����] - some may fail.
> 
> main           | DEBUG httpclient.wire       >> "POST / HTTP/1.1[\r][\n]"
> 
> main           | DEBUG httpclient.wire       >> "Content-type: 
> text/plain; charset=UTF-8[\r][\n]"
> 
> main           | DEBUG httpclient.wire       >> "User-Agent: Jakarta 
> Commons-HttpClient/2.0final[\r][\n]"
> 
> main           | DEBUG httpclient.wire       >> "Host: 
> localhost:1234[\r][\n]"
> 
> main           | DEBUG httpclient.wire       >> "Content-Length: 59[\r][\n]"
> 
> main           | DEBUG httpclient.wire       >> "[\r][\n]"
> 
> main           | DEBUG httpclient.wire       >> "These are the 
> characters - 
> Abcde;+[X[0xfffd][0xfffd][0xfffd][0xfffd][0xfffd][0xfffd][0xfffd][0xfffd][0xfffd][0xfffd]]
>  
> - some may "
> 
> main           | DEBUG httpclient.wire       << "HTTP/1.1 400 No Host 
> matches server name localhost[\r][\n]"
> 
> main           | DEBUG httpclient.wire       << "Transfer-Encoding: 
> chunked[\r][\n]"
> 
> main           | DEBUG httpclient.wire       << "Date: Wed, 16 Mar 2005 
> 20:56:59 GMT[\r][\n]"
> 
> main           | DEBUG httpclient.wire       << "Server: 
> Apache-Coyote/1.1[\r][\n]"
> 
> main           | DEBUG httpclient.wire       << "Connection: close[\r][\n]"
> 
> main           | DEBUG test.PPSTest          end.
> 
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
> 
> 
> Why do the Unicode "placeholder" characters result? Shouldn't the UTF-8 
> Encoding be able to handle them? Why is the output truncated?
> 
> Thanks.
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to