[
https://issues.apache.org/jira/browse/SOLR-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13544217#comment-13544217
]
Uwe Schindler edited comment on SOLR-4265 at 1/4/13 8:18 PM:
-------------------------------------------------------------
Alex: Solr expects all URL parameters encoded as UTF-8 - PERIOD. The problem we
are discussing about here is that some servlet containers use ISO-8859-1 to
decode the parameters, so although you pass UTF-8-URL-encoded values (e.g. your
example would be "q=m%C3%AAme") the servlet container may not use UTF-8 to
decode the %-encoded parts. This causes the issue you have seen. And this is
currently a configuration issue (in Tomcat you have to change connector), in
Jetty you have to set the body encoding (sorry, Dawid, in Jetty this works
definitely).
The HTTP protocol by itsself has nothing to do with this. The whole issue is
about the request URI and the decoding of the URL parameters (URLDecorder java
class).
My proposal to fix this in a portable way (like we did with the
InputStreams/OutputStreams instead of using Readers/Writers to prevent the
buggy Jetty Readers/Writers)): For POST requests, let us set the body encoding
(as demonstrated in the patch) to UTF-8. And for the GET-parameters lets decode
them manually. Its just a series of String.split() and URLDecoder.decode(...,
"UTF-8")
was (Author: thetaphi):
Alex: Solr expects all URL parameters encoded as UTF-8 - PERIOD. The
problem we are discussing about here is that some servlet containers use
ISO-8859-1 to decode the parameters, so although you pass UTF-8-URL-encoded
values (e.g. your example would be "q=m%C3%AAme") the servlet container may not
use UTF-8 to decode the %-encoded parts. This causes the issue you have seen.
And this is currently a configuration issue (in Tomcat you have to change
connector), in Jetty you have to set the body encoding (sorry,
The HTTP protocol by itsself has nothing to do with this. The whole issue is
about the request URI and the decoding of the URL parameters (URLDecorder java
class).
My proposal to fix this in a portable way (like we did with the
InputStreams/OutputStreams instead of using Readers/Writers to prevent the
buggy Jetty Readers/Writers)): For POST requests, let us set the body encoding
(as demonstrated in the patch) to UTF-8. And for the GET-parameters lets decode
them manually. Its just a series of String.split() and URLDecoder.decode(...,
"UTF-8")
> Encoding problem from test console
> ----------------------------------
>
> Key: SOLR-4265
> URL: https://issues.apache.org/jira/browse/SOLR-4265
> Project: Solr
> Issue Type: Bug
> Components: web gui
> Affects Versions: 4.0
> Environment: Windows but, environment independent
> Reporter: Alex Rocher
> Priority: Blocker
> Attachments: SolrDispatchFilter.java.patch
>
>
> When you type an accent (in french language for example) in the console query
> tester, there's no charset conversion (servlet request charset conversion)
> Eg.: "même" is converted into it's ISO-8859-1 representation ==> fail
> The reason : getCharacterEncoding from HTTPRequest is not tested. Il it's
> null, il will assume to convert an UTF-8 encoding charset.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]