Dear solr-dev,
when I make a request to an URI like /solr/my_core/query?q=%C0, I get a HTTP
500 status code with a stack trace originating at
org.apache.solr.common.SolrException: URLDecoder: Invalid character encoding
detected after position 2 of query string / form data (while parsing as UTF-8)
at
org.apache.solr.servlet.SolrRequestParsers.decodeChars(SolrRequestParsers.java:421)
…
The obvious reason is that the `q` parameter value looks like the first byte in
a multibyte utf-8 sequence, but that sequence is incomplete/invalid. I have
seen a few more instances of this in our monitoring, also with different places
where the problem surfaces.
The question I’d like to ask is if there is any particular reason why this
leads to a HTTP 500 status code.
Wouldn’t something like e. g. HTTP 400 (Bad Request) make more sense? At least
in my case, it would make processing in the downstream systems (that have to
deal with Solr’s response) much easier if I could recognize this class of
errors.
Also, if I look at the place where the exception is being thrown
(https://github.com/apache/solr/blob/releases/lucene-solr/7.7.3/solr/core/src/java/org/apache/solr/servlet/SolrRequestParsers.java#L419-L422),
care was taken to use the `ErrorCode.BAD_REQUEST` status. This information,
however, seems to be lost along the way.
So, my question is: Is there a good reason why the status code from the
`SolrException` is not being propagated to the HTTP response in a central
location? Does this deserve a bug, or are there any good reasons why it cannot
easily be fixed or was even designed this way?
Thanks!
-mp.