[ 
https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023769#comment-17023769
 ] 

ASF subversion and git services commented on SOLR-14189:
--------------------------------------------------------

Commit e934c8a7caee42565bd4c3982e6b46a561ebecfe in lucene-solr's branch 
refs/heads/branch_8x from Uwe Schindler
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e934c8a ]

SOLR-14189: Add changes entry


> Some whitespace characters bypass zero-length test in query parsers leading 
> to 400 Bad Request
> ----------------------------------------------------------------------------------------------
>
>                 Key: SOLR-14189
>                 URL: https://issues.apache.org/jira/browse/SOLR-14189
>             Project: Solr
>          Issue Type: Improvement
>          Components: query parsers
>            Reporter: Andy Webb
>            Assignee: Uwe Schindler
>            Priority: Major
>             Fix For: master (9.0), 8.5
>
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> The edismax and some other query parsers treat pure whitespace queries as 
> empty queries, but they use Java's {{String.trim()}} method to normalise 
> queries. That method [only treats characters 0-32 as 
> whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--].
>  Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - 
> which bypass the test and lead to {{400 Bad Request}} responses - see for 
> example {{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs 
> {{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the 
> exception:
> {noformat}
> org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "<EOF>" at 
> line 1, column 0. Was expecting one of: <NOT> ... "+" ... "-" ... <BAREOPER> 
> ... "(" ... "*" ... <QUOTED> ... <TERM> ... <PREFIXTERM> ... <WILDTERM> ... 
> <REGEXPTERM> ... "[" ... "{" ... <LPARAMS> ... "filter(" ... <NUMBER> ... 
> <TERM> ...
> {noformat}
> [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, 
> edismax and rerank query parsers to use 
> [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-]
>  which is aware of all whitespace characters.
> Prior to the change, rerank behaves differently for U+3000 and U+0020 - with 
> the change, both the below give the "mandatory parameter" message:
> {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%E3%80%80}}
>  - generic 400 Bad Request
> {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%20}}
>  - 400 reporting "reRankQuery parameter is mandatory"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to