[ 
https://issues.apache.org/jira/browse/SOLR-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved SOLR-4283.
---------------------------------

    Resolution: Fixed

Committed to trunk and 4.x.

A next step would be to make the encoding of the GET-URLs configureable (using 
the defacto standard "&ie=charset" URL parameter, as used by most REST 
webservices of major search engines).
                
> Improve URL decoding (followup of SOLR-4265)
> --------------------------------------------
>
>                 Key: SOLR-4283
>                 URL: https://issues.apache.org/jira/browse/SOLR-4283
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.1, 5.0
>
>         Attachments: index.jsp, request.http, SOLR-4283.patch, 
> SOLR-4283.patch, SOLR-4283.patch, SOLR-4283.patch, SOLR-4283.patch
>
>
> Followup of SOLR-4265:
> SOLR-4265 has 2 problems:
> - it reads the whole InputStream into a String and this one can be big. This 
> wastes memory, especially when your query string from the POSted form data is 
> near the 2 Megabyte limit. The String is then packed in splitted form into a 
> big Map.
> - it does not report corrupt UTF-8
> The attached patch will do 2 things:
> - The decoding of the POSTed form data is done on the ServletInputStream, 
> directly parsing the bytes (not chars). Key/Value pairs are extracted and 
> %-decoded to byte[] on the fly. URL-parameters from getQueryString() are 
> parsed with the same code using ByteArrayInputStream on the original String, 
> interpreted as UTF-8 (this is a hack, because Servlet API does not give back 
> the original bytes from the HTTP request). To be standards conform, the query 
> String should be interpreted as US-ASCII, but with this approach, not full 
> escaped UTF-8 from the HTTP request survive.
> - the byte[] key/value pairs are converted to Strings using CharsetDecoder
> This will be memory efficient and will report incorrect escaped form data, so 
> people will no longer complain if searches hit no results or similar.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to