[ 
https://issues.apache.org/jira/browse/SOLR-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13544764#comment-13544764
 ] 

Dawid Weiss commented on SOLR-4265:
-----------------------------------

I'm fine with this patch if you want to commit it although I'd rather see 
out-of-the-box compliance against a typical browser behavior? Namely, if you 
have a POST form, with a target URL that contains (unescaped) unicode 
characters, the browser will use the originating page's encoding for both POST 
parameters and the URL. Say, something like:
{code}
    <form action="echo.jsp?abc=łódź" method="post" 
enctype="application/x-www-form-urlencoded">
      <input type="text" name="blah"><br>
      <input type="submit" value="Submit">
    </form>    
{code}

Enforcing UTF-8 but respecting POST encoding for the body seems like we're 
creating custom rules which, given the mess we talked about above, should 
probably be avoided. The rules for me should be fairly simple:

- try to get character encoding from HTTP header; if not present, assume UTF-8
- decode the URI and the body (if POST) using the above encoding. If decoder 
failures occur, return HTTP BAD_REQUEST.

The advantage here is that these rules won't surprise people that have simple 
HTML forms (for some reason with partial query string already hardcoded in the 
action attribute). If you apply your patch, you'll have to url-escape UTF-8 
yourself (assuming the page's encoding is not UTF-8).

Just a thought, I don't have a strong opinion about this.


                
> Fix decoding of GET/POST parameters for servlet containers with non-UTF-8 URL 
> parsing (Tomcat)
> ----------------------------------------------------------------------------------------------
>
>                 Key: SOLR-4265
>                 URL: https://issues.apache.org/jira/browse/SOLR-4265
>             Project: Solr
>          Issue Type: Bug
>          Components: web gui
>    Affects Versions: 4.0
>         Environment: Windows but, environment independent
>            Reporter: Alex Rocher
>            Assignee: Uwe Schindler
>         Attachments: CropperCapture[4].png, CropperCapture[5].png, 
> CropperCapture[6].png, SOLR-4265.patch, SOLR-4265.patch, SOLR-4265.patch, 
> SOLR-4265.patch, SolrDispatchFilter.java.patch
>
>
> When you type an accent (in french language for example) in the console query 
> tester, there's no charset conversion (servlet request charset conversion)
> Eg.: "même" is converted into it's ISO-8859-1 representation ==> fail
> The reason : getCharacterEncoding from HTTPRequest is not tested. Il it's 
> null, il will assume to convert an UTF-8 encoding charset.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to