bq. It's now part of HttpClient. Were you referring to Line230 of HttpClientUtil on master ? - cm.setValidateAfterInactivity(Integer.getInteger(VALIDATE_AFTER_INACTIVITY, VALIDATE_AFTER_INACTIVITY_DEFAULT));
On Fri, Jun 17, 2016 at 12:13 PM, Varun Thacker <[email protected]> wrote: > Hi Mark, > > We were running Solr 5.4.1 on a 4 node machine and a 2 shard 2 replica > collection. > The test data is roughly 30M large documents. The indexing process is via > map-reduce and there are 80 parallel reducers sending a batch of 500 > documents to solr at a go. > > In this setup almost all runs hit the NoHttpResponseException b/w leader > and replica once. > > "It's now part of HttpClient." - Sorry I didn't quite follow whats part of > HttpClient? > > > > On Fri, Jun 17, 2016 at 6:51 AM, Mark Miller <[email protected]> > wrote: > >> I'm sorry, you say it's easy to reproduce, but can you explain roughly >> what you are doing to reproduce it? >> >> Mark >> >> On Thu, Jun 16, 2016 at 9:20 PM Mark Miller <[email protected]> >> wrote: >> >>> That's already how things work. It's now part of HttpClient. There are >>> some settings you can mess with. Is it easy to reproduce? >>> >>> Mark >>> On Thu, Jun 16, 2016 at 1:15 PM Varun Thacker < >>> [email protected]> wrote: >>> >>>> When running a bulk index process occasionally we see a >>>> NoHttpResponseException error when the leader is forwarding docs to the >>>> replica. I think this is a known issue and can be reproduced pretty easily. >>>> >>>> What makes me want to dig more is that because of one such >>>> NoHttpResponseException the leader will put the replica into recovery. The >>>> replica can never catch up because the indexing throughput is quite high . >>>> This can add hours of recovery time for the replica depending on how many >>>> documents one is indexing . >>>> >>>> So from what I can think we have two options here - >>>> 1. Implement a thread which removes stale connections. This has been >>>> discussed on https://issues.apache.org/jira/browse/SOLR-4509 in the >>>> past >>>> 2. The above solution is not the right way forward. The main problem >>>> here is that replicas can't catch up because Solr doesn't implement >>>> backpressure yet and implementing that would be the correct solution here >>>> >>>> Does anyone have an opinion on how we should we go forward with this >>>> issue? >>>> >>>> >>>> >>>> -- >>>> >>>> >>>> Regards, >>>> Varun Thacker >>>> >>> -- >>> - Mark >>> about.me/markrmiller >>> >> -- >> - Mark >> about.me/markrmiller >> > > > > -- > > > Regards, > Varun Thacker > -- Regards, Varun Thacker
