Ah, forgot to mention, it's only on 7x. Mark On Fri, Jun 17, 2016 at 5:06 AM Varun Thacker <[email protected]> wrote:
> bq. It's now part of HttpClient. > > Were you referring to Line230 of HttpClientUtil on master ? - > cm.setValidateAfterInactivity(Integer.getInteger(VALIDATE_AFTER_INACTIVITY, > VALIDATE_AFTER_INACTIVITY_DEFAULT)); > > On Fri, Jun 17, 2016 at 12:13 PM, Varun Thacker < > [email protected]> wrote: > >> Hi Mark, >> >> We were running Solr 5.4.1 on a 4 node machine and a 2 shard 2 replica >> collection. >> The test data is roughly 30M large documents. The indexing process is via >> map-reduce and there are 80 parallel reducers sending a batch of 500 >> documents to solr at a go. >> >> In this setup almost all runs hit the NoHttpResponseException b/w leader >> and replica once. >> >> "It's now part of HttpClient." - Sorry I didn't quite follow whats part >> of HttpClient? >> >> >> >> On Fri, Jun 17, 2016 at 6:51 AM, Mark Miller <[email protected]> >> wrote: >> >>> I'm sorry, you say it's easy to reproduce, but can you explain roughly >>> what you are doing to reproduce it? >>> >>> Mark >>> >>> On Thu, Jun 16, 2016 at 9:20 PM Mark Miller <[email protected]> >>> wrote: >>> >>>> That's already how things work. It's now part of HttpClient. There are >>>> some settings you can mess with. Is it easy to reproduce? >>>> >>>> Mark >>>> On Thu, Jun 16, 2016 at 1:15 PM Varun Thacker < >>>> [email protected]> wrote: >>>> >>>>> When running a bulk index process occasionally we see a >>>>> NoHttpResponseException error when the leader is forwarding docs to the >>>>> replica. I think this is a known issue and can be reproduced pretty >>>>> easily. >>>>> >>>>> What makes me want to dig more is that because of one such >>>>> NoHttpResponseException the leader will put the replica into recovery. The >>>>> replica can never catch up because the indexing throughput is quite high . >>>>> This can add hours of recovery time for the replica depending on how many >>>>> documents one is indexing . >>>>> >>>>> So from what I can think we have two options here - >>>>> 1. Implement a thread which removes stale connections. This has been >>>>> discussed on https://issues.apache.org/jira/browse/SOLR-4509 in the >>>>> past >>>>> 2. The above solution is not the right way forward. The main problem >>>>> here is that replicas can't catch up because Solr doesn't implement >>>>> backpressure yet and implementing that would be the correct solution here >>>>> >>>>> Does anyone have an opinion on how we should we go forward with this >>>>> issue? >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> >>>>> Regards, >>>>> Varun Thacker >>>>> >>>> -- >>>> - Mark >>>> about.me/markrmiller >>>> >>> -- >>> - Mark >>> about.me/markrmiller >>> >> >> >> >> -- >> >> >> Regards, >> Varun Thacker >> > > > > -- > > > Regards, > Varun Thacker > -- - Mark about.me/markrmiller
