[
https://issues.apache.org/jira/browse/SOLR-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13544774#comment-13544774
]
Uwe Schindler commented on SOLR-4265:
-------------------------------------
bq. The URL is invalid in the HTML source but browsers will "fix" it for you
(url-escape) and many people accept it as something ordinary.
Thats the whole problem, I agree. But we don't know what the browser used as
encoding for encoding the URL. So without a charset given somewhere we cannot
decode the URL. You have the problem with GET and POST requests. Because of
this you cannot fix this for SOLR.
The patch does not change any behaviour in guessing charsets from before, the
*only* change here is the encoding used to decode URLs (which is now "UTF-8"
because several web containers handle this in a different way). Jetty and
Tomcat both handled POST content respecting the charset of the POST BODY - and
that did not change.
Where is your problem with Solr? The whole discussion could be flame-wared on
the Jetty or Tomcat lists as before, unfortunately the HTTP spec and the
Servlet spec and the URL spec are not precise enough. For Solr it is not an
issue: Solr is documented to only accept URL encoded request parameters as
UTF-8.
The only way to change this would be to do it like search engine. They allow to
pass in an "ie=" extra GET parameter that defines the "input encoding" of the
URL parameters. In that case you could do a 2-step URL parsing approach (or use
commons-codec: decode the binary url from the byte[] and then interpret the
"ie" parameter as US-ASCII and use it to decode the remaining parameters.
> Fix decoding of GET/POST parameters for servlet containers with non-UTF-8 URL
> parsing (Tomcat)
> ----------------------------------------------------------------------------------------------
>
> Key: SOLR-4265
> URL: https://issues.apache.org/jira/browse/SOLR-4265
> Project: Solr
> Issue Type: Bug
> Components: web gui
> Affects Versions: 4.0
> Environment: Windows but, environment independent
> Reporter: Alex Rocher
> Assignee: Uwe Schindler
> Attachments: CropperCapture[4].png, CropperCapture[5].png,
> CropperCapture[6].png, SOLR-4265.patch, SOLR-4265.patch, SOLR-4265.patch,
> SOLR-4265.patch, SOLR-4265.patch, SolrDispatchFilter.java.patch
>
>
> When you type an accent (in french language for example) in the console query
> tester, there's no charset conversion (servlet request charset conversion)
> Eg.: "même" is converted into it's ISO-8859-1 representation ==> fail
> The reason : getCharacterEncoding from HTTPRequest is not tested. Il it's
> null, il will assume to convert an UTF-8 encoding charset.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]