[ http://issues.apache.org/struts/browse/STR-1941?page=all ]
David Evans closed STR-1941:
----------------------------
Resolution: Fixed
> double UTF-8 encoding of HTTP request parameters
> ------------------------------------------------
>
> Key: STR-1941
> URL: http://issues.apache.org/struts/browse/STR-1941
> Project: Struts Action 1
> Type: Bug
> Components: Action
> Versions: Nightly Build
> Environment: Operating System: other
> Platform: Other
> Reporter: Akos Maroy
> Assignee: David Evans
>
> I'm having a problem with properly processing UTF-8 encoded request parameters
> through struts. The effect is, that international characters (that are not
> ASCII, thus are multi-byte UTF-8 characters) are encoded twice into UTF-8.
> As an example, let's see the examples webapp included in the jakarta-struts
> source tree. It has the registration sample, reachable through
> http://localhost:8080/struts-examples/validator/registration.do
> if installed on localhost:8080. let's suppose I which to type:
> small letter a with acute: á
> unicode value hex: 00e1
> unicode value binary: 11100001
> UTF-8 binary: 11000011 10100001
> UTF-8 in hex: c3a1
> into the firstName field into the form. this can be simulated by:
> http://localhost:8080/struts-examples/validator/registration-submit.do?firstName=%C3%A1
> (if typed manually and submitted via POST, has the same effect)
> the resuling page shows a lot of form problems, as I didn't fill out most of
> the
> fields, which is OK. but more importantly, it also shows the entered letter in
> the firstName input field. what is vierd, is that a different letter is shown
> (actually two letters). running xxd on the received page, here's the relevant
> part:
> 00003a0: 6e67 7468 3d22 3330 2220 7369 7a65 3d22 ngth="30" size="
> 00003b0: 3330 2220 7661 6c75 653d 22c3 83c2 a122 30" value="...."
> 00003c0: 3e0a 2020 2020 3c2f 7464 3e0a 2020 3c2f >. </td>. </
> with the important part at value="....", which is:
> 00003b0: 3330 2220 7661 6c75 653d 22c3 83c2 a122 30" value="...."
> ^^^^^^^^^^
> the letters presented are:
> UTF-8 hex sequence: c383c2a1
> UTF-8 binary: 11000011 10000011 11000010 10100001
> which is actually two UTF-8 letters by now. what is funny, that if I 'decode'
> them from UTF-8, I get the original UTF-8 sequence:
> first part, as received: 11000011 10000011
> de-coded: 11000011
> second part, as received: 11000010 10100001
> de-coded: 10100001
> and voila, the the parts make up the original UTF-8 sequence:
> 11000011 10100001
> which actually is the UTF-8 sequence for the letter sent.
> if I resend this page (the by now to UTF-8 letters), I get four letters, then
> 8,
> etc. it seems, that the engine doesn't recognize, that there are UTF-8
> sequences
> to begin with, and encodes them 'again'.
> I'm using mozilla as a browser, Tomcat 5.0.16. the encoding of the pages is
> UTF-8.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/struts/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]