[
https://issues.apache.org/jira/browse/SOLR-17394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SOLR-17394:
----------------------------------
Labels: pull-request-available (was: )
> IndexFetcher should inspect HTTP status codes on its requests
> -------------------------------------------------------------
>
> Key: SOLR-17394
> URL: https://issues.apache.org/jira/browse/SOLR-17394
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: replication (java)
> Affects Versions: main (10.0), 9.6.1
> Reporter: Jason Gerlowski
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Typically, SolrJ will look at the HTTP status code of a response and it will
> throw exceptions as appropriate (see
> [here|https://github.com/apache/solr/blob/988e9e3c2666784638475f4cd4827bc2fe77c707/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrClientBase.java#L187-L217]).
> But it skips this logic if users have elected to parse the response
> themselves by use of an "InputStreamResponseParser".
> Solr's IndexFetcher uses this "InputStreamResponseParser" so that it can
> access the binary index data in the HTTP response. But it doesn't check the
> status code of responses as it should.
> IndexFetcher will typically notice that the response is unexpected and can
> retry and ultimately succeed, but this happens relatively late in the
> process. And that delay can be very expensive in many cases. For instance:
> When IndexFetcher gets a "filecontent" response, it expects the first few
> bytes to indicate the size of the binary response. So it [reads these bytes
> and instantiates a byte-array of the indicated
> size|https://github.com/apache/solr/blob/988e9e3c2666784638475f4cd4827bc2fe77c707/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java#L1836C1-L1847].
> But if IndexFetcher happens to be reading a 404 response, the first few
> bytes of the response will be the '*<*', '*h*', '*e*', and '*a*' characters
> from the "*<head*>" tag that Solr uses to begin all its HTML errors. This
> leads to IndexFetcher allocating a massive > 1GB byte-array! This can cause
> GC churn in production and (for me at least) was causing test runs to
> frequently OOM on certain machines.
> We should have IndexFetcher (and other places that use
> InputStreamResponseParser) check the response status code as soon as its
> available and handle errors accordingly.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]