[
https://issues.apache.org/jira/browse/SOLR-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13544560#comment-13544560
]
Yonik Seeley commented on SOLR-4265:
------------------------------------
I did some manual testing, and one difference I notice is that on IE10 (Windows
8), pasting this into the address bar
http://rogue.local:8983/solr/query?q=héllo
Results in
{code}
HTTP ERROR 500
Problem accessing /solr/query. Reason:
{msg=Not valid UTF8! byte 6c in state
3,trace=org.eclipse.jetty.util.Utf8Appendable$NotUtf8Exception: Not valid UTF8!
byte 6c in state 3
at
org.eclipse.jetty.util.Utf8Appendable.appendByte(Utf8Appendable.java:174)
at org.eclipse.jetty.util.Utf8Appendable.append(Utf8Appendable.java:113)
at org.eclipse.jetty.http.HttpURI.toUtf8String(HttpURI.java:503)
at org.eclipse.jetty.http.HttpURI.getQuery(HttpURI.java:672)
at org.eclipse.jetty.server.Request.getQueryString(Request.java:835)
at
org.apache.solr.servlet.StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:395)
at
org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:115)
{code}
So it looks like IE10 doesn't percent encode the international character, but
it wouldn't matter even if they did because it would be percent encoded latin-1
instead of UTF-8. It would probably work with Tomcat however (with or without
this current patch).
The old behavior did not result in an HTTP error, but I actually think this new
behavior is preferable!
Before this patch, the request was just incorrect and did not match the users
intentions. At least now it will fail more quickly.
> Fix decoding of GET/POST parameters for servlet containers with non-UTF-8 URL
> parsing (Tomcat)
> ----------------------------------------------------------------------------------------------
>
> Key: SOLR-4265
> URL: https://issues.apache.org/jira/browse/SOLR-4265
> Project: Solr
> Issue Type: Bug
> Components: web gui
> Affects Versions: 4.0
> Environment: Windows but, environment independent
> Reporter: Alex Rocher
> Assignee: Uwe Schindler
> Attachments: SOLR-4265.patch, SOLR-4265.patch,
> SolrDispatchFilter.java.patch
>
>
> When you type an accent (in french language for example) in the console query
> tester, there's no charset conversion (servlet request charset conversion)
> Eg.: "même" is converted into it's ISO-8859-1 representation ==> fail
> The reason : getCharacterEncoding from HTTPRequest is not tested. Il it's
> null, il will assume to convert an UTF-8 encoding charset.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]