Re: NoHttpResponseException error between leader and replica

Mark Miller Fri, 17 Jun 2016 06:54:11 -0700

No, you can't do it good enough with built in support. You can follow that
ticket and see that is how I started. If it's a big enough issue, we should
backport that to 6x and deal with the back compat breaks.


On Fri, Jun 17, 2016 at 8:11 AM Varun Thacker <[email protected]>
wrote:

> Hi Mark,
>
> So for the 6.x line do you think we should add a background thread which
> expires idle connections and expired connections ?  Or do you have any
> other recommendations ?
>
> On Fri, Jun 17, 2016 at 5:25 PM, Mark Miller <[email protected]>
> wrote:
>
>> Ah, forgot to mention, it's only on 7x.
>>
>> Mark
>>
>> On Fri, Jun 17, 2016 at 5:06 AM Varun Thacker <[email protected]>
>> wrote:
>>
>>> bq. It's now part of HttpClient.
>>>
>>> Were you referring to Line230 of HttpClientUtil on master ? - 
>>> cm.setValidateAfterInactivity(Integer.getInteger(VALIDATE_AFTER_INACTIVITY,
>>> VALIDATE_AFTER_INACTIVITY_DEFAULT));
>>>
>>> On Fri, Jun 17, 2016 at 12:13 PM, Varun Thacker <
>>> [email protected]> wrote:
>>>
>>>> Hi Mark,
>>>>
>>>> We were running Solr 5.4.1 on a 4 node machine and a 2 shard 2 replica
>>>> collection.
>>>> The test data is roughly 30M large documents. The indexing process is
>>>> via map-reduce and there are 80 parallel reducers sending a batch of 500
>>>> documents to solr at a go.
>>>>
>>>> In this setup almost all runs hit the NoHttpResponseException b/w
>>>> leader and replica once.
>>>>
>>>> "It's now part of HttpClient." - Sorry I didn't quite follow whats part
>>>> of HttpClient?
>>>>
>>>>
>>>>
>>>> On Fri, Jun 17, 2016 at 6:51 AM, Mark Miller <[email protected]>
>>>> wrote:
>>>>
>>>>> I'm sorry, you say it's easy to reproduce, but can you explain roughly
>>>>> what you are doing to reproduce it?
>>>>>
>>>>> Mark
>>>>>
>>>>> On Thu, Jun 16, 2016 at 9:20 PM Mark Miller <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> That's already how things work. It's now part of HttpClient. There
>>>>>> are some settings you can mess with. Is it easy to reproduce?
>>>>>>
>>>>>> Mark
>>>>>> On Thu, Jun 16, 2016 at 1:15 PM Varun Thacker <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> When running a bulk index process occasionally we see a
>>>>>>> NoHttpResponseException error when the leader is forwarding docs to the
>>>>>>> replica. I think this is a known issue and can be reproduced pretty 
>>>>>>> easily.
>>>>>>>
>>>>>>> What makes me want to dig more is that because of one such
>>>>>>> NoHttpResponseException the leader will put the replica into recovery. 
>>>>>>> The
>>>>>>> replica can never catch up because the indexing throughput is quite 
>>>>>>> high .
>>>>>>> This can add hours of recovery time for the replica depending on how 
>>>>>>> many
>>>>>>> documents one is indexing .
>>>>>>>
>>>>>>> So from what I can think we have two options here -
>>>>>>> 1. Implement a thread which removes stale connections. This has been
>>>>>>> discussed on https://issues.apache.org/jira/browse/SOLR-4509 in the
>>>>>>> past
>>>>>>> 2. The above solution is not the right way forward. The main problem
>>>>>>> here is that replicas can't catch up because Solr doesn't implement
>>>>>>> backpressure yet and implementing that would be the correct solution 
>>>>>>> here
>>>>>>>
>>>>>>> Does anyone have an opinion on how we should we go forward with this
>>>>>>> issue?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Varun Thacker
>>>>>>>
>>>>>> --
>>>>>> - Mark
>>>>>> about.me/markrmiller
>>>>>>
>>>>> --
>>>>> - Mark
>>>>> about.me/markrmiller
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> Regards,
>>>> Varun Thacker
>>>>
>>>
>>>
>>>
>>> --
>>>
>>>
>>> Regards,
>>> Varun Thacker
>>>
>> --
>> - Mark
>> about.me/markrmiller
>>
>
>
>
> --
>
>
> Regards,
> Varun Thacker
>
-- 
- Mark
about.me/markrmiller

Re: NoHttpResponseException error between leader and replica

Reply via email to