[ 
https://issues.apache.org/jira/browse/SOLR-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13544217#comment-13544217
 ] 

Uwe Schindler edited comment on SOLR-4265 at 1/4/13 8:18 PM:
-------------------------------------------------------------

Alex: Solr expects all URL parameters encoded as UTF-8 - PERIOD. The problem we 
are discussing about here is that some servlet containers use ISO-8859-1 to 
decode the parameters, so although you pass UTF-8-URL-encoded values (e.g. your 
example would be "q=m%C3%AAme") the servlet container may not use UTF-8 to 
decode the %-encoded parts. This causes the issue you have seen. And this is 
currently a configuration issue (in Tomcat you have to change connector), in 
Jetty you have to set the body encoding (sorry, Dawid, in Jetty this works 
definitely).

The HTTP protocol by itsself has nothing to do with this. The whole issue is 
about the request URI and the decoding of the URL parameters (URLDecorder java 
class).

My proposal to fix this in a portable way (like we did with the 
InputStreams/OutputStreams instead of using Readers/Writers to prevent the 
buggy Jetty Readers/Writers)): For POST requests, let us set the body encoding 
(as demonstrated in the patch) to UTF-8. And for the GET-parameters lets decode 
them manually. Its just a series of String.split() and URLDecoder.decode(..., 
"UTF-8")
                
      was (Author: thetaphi):
    Alex: Solr expects all URL parameters encoded as UTF-8 - PERIOD. The 
problem we are discussing about here is that some servlet containers use 
ISO-8859-1 to decode the parameters, so although you pass UTF-8-URL-encoded 
values (e.g. your example would be "q=m%C3%AAme") the servlet container may not 
use UTF-8 to decode the %-encoded parts. This causes the issue you have seen. 
And this is currently a configuration issue (in Tomcat you have to change 
connector), in Jetty you have to set the body encoding (sorry,

The HTTP protocol by itsself has nothing to do with this. The whole issue is 
about the request URI and the decoding of the URL parameters (URLDecorder java 
class).

My proposal to fix this in a portable way (like we did with the 
InputStreams/OutputStreams instead of using Readers/Writers to prevent the 
buggy Jetty Readers/Writers)): For POST requests, let us set the body encoding 
(as demonstrated in the patch) to UTF-8. And for the GET-parameters lets decode 
them manually. Its just a series of String.split() and URLDecoder.decode(..., 
"UTF-8")
                  
> Encoding problem from test console
> ----------------------------------
>
>                 Key: SOLR-4265
>                 URL: https://issues.apache.org/jira/browse/SOLR-4265
>             Project: Solr
>          Issue Type: Bug
>          Components: web gui
>    Affects Versions: 4.0
>         Environment: Windows but, environment independent
>            Reporter: Alex Rocher
>            Priority: Blocker
>         Attachments: SolrDispatchFilter.java.patch
>
>
> When you type an accent (in french language for example) in the console query 
> tester, there's no charset conversion (servlet request charset conversion)
> Eg.: "même" is converted into it's ISO-8859-1 representation ==> fail
> The reason : getCharacterEncoding from HTTPRequest is not tested. Il it's 
> null, il will assume to convert an UTF-8 encoding charset.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to