Andreas Kohn created SHINDIG-1981:
-------------------------------------
Summary: Wrong encoding
Key: SHINDIG-1981
URL: https://issues.apache.org/jira/browse/SHINDIG-1981
Project: Shindig
Issue Type: Bug
Components: Java
Affects Versions: 2.5.1
Reporter: Andreas Kohn
We're using RPC requests with multipart/form-data encoding when uploading
files. All encoding settings on both frontend and backend are configured to
UTF-8, to handle non-ASCII content.
However, even then the content inside the 'request' object was still
encoding-wise garbage.
Debugging that showed that when the JsonRpcServlet is parsing the request body
it assumes that the encoding is either ISO-8859-1 for non-file items, or is
defined in the Content-Type header on that item.
In HTML 5 this is both no longer a correct assumption as per
http://dev.w3.org/html5/spec-preview/constraints.html#multipart-form-data
{quote}
If the algorithm was invoked with an explicit character encoding, let the
selected character encoding be that encoding. (This algorithm is used by other
specifications, which provide an explicit character encoding to avoid the
dependency on the form element described in the next paragraph.)
Otherwise, if the form element has an accept-charset attribute, then, taking
into account the characters found in the form data set's names and values, and
the character encodings supported by the user agent, select a character
encoding from the list given in the form's accept-charset attribute that is an
ASCII-compatible character encoding. If none of the encodings are supported, or
if none are listed, then let the selected character encoding be UTF-8.
Otherwise, if the document's character encoding is an ASCII-compatible
character encoding, then that is the selected character encoding.
Otherwise, let the selected character encoding be UTF-8.
{quote}
and
{quote}
The parts of the generated multipart/form-data resource that correspond to
non-file fields must not have a Content-Type header specified. Their names and
values must be encoded using the character encoding selected above (field names
in particular do not get converted to a 7-bit safe encoding as suggested in RFC
2388).
{quote}
The patch in the review https://reviews.apache.org/r/24449/ fixes the problem
for us, by using the request encoding as a default when the content-type header
does not specify any other encoding.
I've tested this with firefox on linux, and am currently checking that it still
works as expected with IE and chrome.
--
This message was sent by Atlassian JIRA
(v6.2#6252)