[ 
https://issues.apache.org/jira/browse/SHINDIG-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Kohn updated SHINDIG-1981:
----------------------------------

    Summary: Wrong encoding of non-file form items in RPC requests with 
multipart/form-data  (was: Wrong encoding )

> Wrong encoding of non-file form items in RPC requests with multipart/form-data
> ------------------------------------------------------------------------------
>
>                 Key: SHINDIG-1981
>                 URL: https://issues.apache.org/jira/browse/SHINDIG-1981
>             Project: Shindig
>          Issue Type: Bug
>          Components: Java
>    Affects Versions: 2.5.1
>            Reporter: Andreas Kohn
>
> We're using RPC requests with multipart/form-data encoding when uploading 
> files. All encoding settings on both frontend and backend are configured to 
> UTF-8, to handle non-ASCII content.
> However, even then the content inside the 'request' object was still 
> encoding-wise garbage.
> Debugging that showed that when the JsonRpcServlet is parsing the request 
> body it assumes that the encoding is either ISO-8859-1 for non-file items, or 
> is defined in the Content-Type header on that item. 
> In HTML 5 this is both no longer a correct assumption as per 
> http://dev.w3.org/html5/spec-preview/constraints.html#multipart-form-data
> {quote}
> If the algorithm was invoked with an explicit character encoding, let the 
> selected character encoding be that encoding. (This algorithm is used by 
> other specifications, which provide an explicit character encoding to avoid 
> the dependency on the form element described in the next paragraph.)
> Otherwise, if the form element has an accept-charset attribute, then, taking 
> into account the characters found in the form data set's names and values, 
> and the character encodings supported by the user agent, select a character 
> encoding from the list given in the form's accept-charset attribute that is 
> an ASCII-compatible character encoding. If none of the encodings are 
> supported, or if none are listed, then let the selected character encoding be 
> UTF-8.
> Otherwise, if the document's character encoding is an ASCII-compatible 
> character encoding, then that is the selected character encoding.
> Otherwise, let the selected character encoding be UTF-8.
> {quote}
> and
> {quote}
> The parts of the generated multipart/form-data resource that correspond to 
> non-file fields must not have a Content-Type header specified. Their names 
> and values must be encoded using the character encoding selected above (field 
> names in particular do not get converted to a 7-bit safe encoding as 
> suggested in RFC 2388).
> {quote}
> The patch in the review https://reviews.apache.org/r/24449/ fixes the problem 
> for us, by using the request encoding as a default when the content-type 
> header does not specify any other encoding.
> I've tested this with firefox on linux, and am currently checking that it 
> still works as expected with IE and chrome.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to