[ 
https://issues.apache.org/jira/browse/SOLR-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13544735#comment-13544735
 ] 

Uwe Schindler commented on SOLR-4265:
-------------------------------------

I am currently investigation to be more srict with parameter encoding: 
Currently it is not an error if the %-encoded terms are not valid UTF-8 
(URLDecoder of JDK replaces all invalid chars with ?). To make this more strict 
and fail correctly (not silently doing the wrong thing), we could use 
commons-codec's (we already use that library) to do a binary URL decoding: 
http://commons.apache.org/codec/api-release/org/apache/commons/codec/net/URLCodec.html

URLCodec.decodeUrl takes byte[] and returns byte[]. Using this method we have 
full flexibility on throwing encoding errors. We can in that case also pass the 
byte[] contents from POST stream directly! Should we do this or not? The 
current approach is greedy like webservers that also accept almost any wrong 
encoded %XX stuff.
                
> Fix decoding of GET/POST parameters for servlet containers with non-UTF-8 URL 
> parsing (Tomcat)
> ----------------------------------------------------------------------------------------------
>
>                 Key: SOLR-4265
>                 URL: https://issues.apache.org/jira/browse/SOLR-4265
>             Project: Solr
>          Issue Type: Bug
>          Components: web gui
>    Affects Versions: 4.0
>         Environment: Windows but, environment independent
>            Reporter: Alex Rocher
>            Assignee: Uwe Schindler
>         Attachments: CropperCapture[4].png, CropperCapture[5].png, 
> CropperCapture[6].png, SOLR-4265.patch, SOLR-4265.patch, SOLR-4265.patch, 
> SolrDispatchFilter.java.patch
>
>
> When you type an accent (in french language for example) in the console query 
> tester, there's no charset conversion (servlet request charset conversion)
> Eg.: "même" is converted into it's ISO-8859-1 representation ==> fail
> The reason : getCharacterEncoding from HTTPRequest is not tested. Il it's 
> null, il will assume to convert an UTF-8 encoding charset.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to