Dear solr-dev,

when I make a request to an URI like /solr/my_core/query?q=%C0, I get a HTTP 
500 status code with a stack trace originating at

org.apache.solr.common.SolrException: URLDecoder: Invalid character encoding 
detected after position 2 of query string / form data (while parsing as UTF-8)
                at 
org.apache.solr.servlet.SolrRequestParsers.decodeChars(SolrRequestParsers.java:421)
…

The obvious reason is that the `q` parameter value looks like the first byte in 
a multibyte utf-8 sequence, but that sequence is incomplete/invalid. I have 
seen a few more instances of this in our monitoring, also with different places 
where the problem surfaces.

The question I’d like to ask is if there is any particular reason why this 
leads to a HTTP 500 status code.

Wouldn’t something like e. g. HTTP 400 (Bad Request) make more sense? At least 
in my case, it would make processing in the downstream systems (that have to 
deal with Solr’s response) much easier if I could recognize this class of 
errors.

Also, if I look at the place where the exception is being thrown 
(https://github.com/apache/solr/blob/releases/lucene-solr/7.7.3/solr/core/src/java/org/apache/solr/servlet/SolrRequestParsers.java#L419-L422),
 care was taken to use the `ErrorCode.BAD_REQUEST` status. This information, 
however, seems to be lost along the way.

So, my question is: Is there a good reason why the status code from the 
`SolrException` is not being propagated to the HTTP response in a central 
location? Does this deserve a bug, or are there any good reasons why it cannot 
easily be fixed or was even designed this way?

Thanks!
-mp.

Reply via email to