No, you can't do it good enough with built in support. You can follow that ticket and see that is how I started. If it's a big enough issue, we should backport that to 6x and deal with the back compat breaks.
On Fri, Jun 17, 2016 at 8:11 AM Varun Thacker <[email protected]> wrote: > Hi Mark, > > So for the 6.x line do you think we should add a background thread which > expires idle connections and expired connections ? Or do you have any > other recommendations ? > > On Fri, Jun 17, 2016 at 5:25 PM, Mark Miller <[email protected]> > wrote: > >> Ah, forgot to mention, it's only on 7x. >> >> Mark >> >> On Fri, Jun 17, 2016 at 5:06 AM Varun Thacker <[email protected]> >> wrote: >> >>> bq. It's now part of HttpClient. >>> >>> Were you referring to Line230 of HttpClientUtil on master ? - >>> cm.setValidateAfterInactivity(Integer.getInteger(VALIDATE_AFTER_INACTIVITY, >>> VALIDATE_AFTER_INACTIVITY_DEFAULT)); >>> >>> On Fri, Jun 17, 2016 at 12:13 PM, Varun Thacker < >>> [email protected]> wrote: >>> >>>> Hi Mark, >>>> >>>> We were running Solr 5.4.1 on a 4 node machine and a 2 shard 2 replica >>>> collection. >>>> The test data is roughly 30M large documents. The indexing process is >>>> via map-reduce and there are 80 parallel reducers sending a batch of 500 >>>> documents to solr at a go. >>>> >>>> In this setup almost all runs hit the NoHttpResponseException b/w >>>> leader and replica once. >>>> >>>> "It's now part of HttpClient." - Sorry I didn't quite follow whats part >>>> of HttpClient? >>>> >>>> >>>> >>>> On Fri, Jun 17, 2016 at 6:51 AM, Mark Miller <[email protected]> >>>> wrote: >>>> >>>>> I'm sorry, you say it's easy to reproduce, but can you explain roughly >>>>> what you are doing to reproduce it? >>>>> >>>>> Mark >>>>> >>>>> On Thu, Jun 16, 2016 at 9:20 PM Mark Miller <[email protected]> >>>>> wrote: >>>>> >>>>>> That's already how things work. It's now part of HttpClient. There >>>>>> are some settings you can mess with. Is it easy to reproduce? >>>>>> >>>>>> Mark >>>>>> On Thu, Jun 16, 2016 at 1:15 PM Varun Thacker < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> When running a bulk index process occasionally we see a >>>>>>> NoHttpResponseException error when the leader is forwarding docs to the >>>>>>> replica. I think this is a known issue and can be reproduced pretty >>>>>>> easily. >>>>>>> >>>>>>> What makes me want to dig more is that because of one such >>>>>>> NoHttpResponseException the leader will put the replica into recovery. >>>>>>> The >>>>>>> replica can never catch up because the indexing throughput is quite >>>>>>> high . >>>>>>> This can add hours of recovery time for the replica depending on how >>>>>>> many >>>>>>> documents one is indexing . >>>>>>> >>>>>>> So from what I can think we have two options here - >>>>>>> 1. Implement a thread which removes stale connections. This has been >>>>>>> discussed on https://issues.apache.org/jira/browse/SOLR-4509 in the >>>>>>> past >>>>>>> 2. The above solution is not the right way forward. The main problem >>>>>>> here is that replicas can't catch up because Solr doesn't implement >>>>>>> backpressure yet and implementing that would be the correct solution >>>>>>> here >>>>>>> >>>>>>> Does anyone have an opinion on how we should we go forward with this >>>>>>> issue? >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> Varun Thacker >>>>>>> >>>>>> -- >>>>>> - Mark >>>>>> about.me/markrmiller >>>>>> >>>>> -- >>>>> - Mark >>>>> about.me/markrmiller >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> >>>> Regards, >>>> Varun Thacker >>>> >>> >>> >>> >>> -- >>> >>> >>> Regards, >>> Varun Thacker >>> >> -- >> - Mark >> about.me/markrmiller >> > > > > -- > > > Regards, > Varun Thacker > -- - Mark about.me/markrmiller
