Hi Mark, So for the 6.x line do you think we should add a background thread which expires idle connections and expired connections ? Or do you have any other recommendations ?
On Fri, Jun 17, 2016 at 5:25 PM, Mark Miller <[email protected]> wrote: > Ah, forgot to mention, it's only on 7x. > > Mark > > On Fri, Jun 17, 2016 at 5:06 AM Varun Thacker <[email protected]> > wrote: > >> bq. It's now part of HttpClient. >> >> Were you referring to Line230 of HttpClientUtil on master ? - >> cm.setValidateAfterInactivity(Integer.getInteger(VALIDATE_AFTER_INACTIVITY, >> VALIDATE_AFTER_INACTIVITY_DEFAULT)); >> >> On Fri, Jun 17, 2016 at 12:13 PM, Varun Thacker < >> [email protected]> wrote: >> >>> Hi Mark, >>> >>> We were running Solr 5.4.1 on a 4 node machine and a 2 shard 2 replica >>> collection. >>> The test data is roughly 30M large documents. The indexing process is >>> via map-reduce and there are 80 parallel reducers sending a batch of 500 >>> documents to solr at a go. >>> >>> In this setup almost all runs hit the NoHttpResponseException b/w leader >>> and replica once. >>> >>> "It's now part of HttpClient." - Sorry I didn't quite follow whats part >>> of HttpClient? >>> >>> >>> >>> On Fri, Jun 17, 2016 at 6:51 AM, Mark Miller <[email protected]> >>> wrote: >>> >>>> I'm sorry, you say it's easy to reproduce, but can you explain roughly >>>> what you are doing to reproduce it? >>>> >>>> Mark >>>> >>>> On Thu, Jun 16, 2016 at 9:20 PM Mark Miller <[email protected]> >>>> wrote: >>>> >>>>> That's already how things work. It's now part of HttpClient. There are >>>>> some settings you can mess with. Is it easy to reproduce? >>>>> >>>>> Mark >>>>> On Thu, Jun 16, 2016 at 1:15 PM Varun Thacker < >>>>> [email protected]> wrote: >>>>> >>>>>> When running a bulk index process occasionally we see a >>>>>> NoHttpResponseException error when the leader is forwarding docs to the >>>>>> replica. I think this is a known issue and can be reproduced pretty >>>>>> easily. >>>>>> >>>>>> What makes me want to dig more is that because of one such >>>>>> NoHttpResponseException the leader will put the replica into recovery. >>>>>> The >>>>>> replica can never catch up because the indexing throughput is quite high >>>>>> . >>>>>> This can add hours of recovery time for the replica depending on how many >>>>>> documents one is indexing . >>>>>> >>>>>> So from what I can think we have two options here - >>>>>> 1. Implement a thread which removes stale connections. This has been >>>>>> discussed on https://issues.apache.org/jira/browse/SOLR-4509 in the >>>>>> past >>>>>> 2. The above solution is not the right way forward. The main problem >>>>>> here is that replicas can't catch up because Solr doesn't implement >>>>>> backpressure yet and implementing that would be the correct solution here >>>>>> >>>>>> Does anyone have an opinion on how we should we go forward with this >>>>>> issue? >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> >>>>>> Regards, >>>>>> Varun Thacker >>>>>> >>>>> -- >>>>> - Mark >>>>> about.me/markrmiller >>>>> >>>> -- >>>> - Mark >>>> about.me/markrmiller >>>> >>> >>> >>> >>> -- >>> >>> >>> Regards, >>> Varun Thacker >>> >> >> >> >> -- >> >> >> Regards, >> Varun Thacker >> > -- > - Mark > about.me/markrmiller > -- Regards, Varun Thacker
