Re: NoHttpResponseException error between leader and replica

Varun Thacker Fri, 17 Jun 2016 05:12:07 -0700

Hi Mark,

So for the 6.x line do you think we should add a background thread which
expires idle connections and expired connections ?  Or do you have any
other recommendations ?


On Fri, Jun 17, 2016 at 5:25 PM, Mark Miller <[email protected]> wrote:

> Ah, forgot to mention, it's only on 7x.
>
> Mark
>
> On Fri, Jun 17, 2016 at 5:06 AM Varun Thacker <[email protected]>
> wrote:
>
>> bq. It's now part of HttpClient.
>>
>> Were you referring to Line230 of HttpClientUtil on master ? - 
>> cm.setValidateAfterInactivity(Integer.getInteger(VALIDATE_AFTER_INACTIVITY,
>> VALIDATE_AFTER_INACTIVITY_DEFAULT));
>>
>> On Fri, Jun 17, 2016 at 12:13 PM, Varun Thacker <
>> [email protected]> wrote:
>>
>>> Hi Mark,
>>>
>>> We were running Solr 5.4.1 on a 4 node machine and a 2 shard 2 replica
>>> collection.
>>> The test data is roughly 30M large documents. The indexing process is
>>> via map-reduce and there are 80 parallel reducers sending a batch of 500
>>> documents to solr at a go.
>>>
>>> In this setup almost all runs hit the NoHttpResponseException b/w leader
>>> and replica once.
>>>
>>> "It's now part of HttpClient." - Sorry I didn't quite follow whats part
>>> of HttpClient?
>>>
>>>
>>>
>>> On Fri, Jun 17, 2016 at 6:51 AM, Mark Miller <[email protected]>
>>> wrote:
>>>
>>>> I'm sorry, you say it's easy to reproduce, but can you explain roughly
>>>> what you are doing to reproduce it?
>>>>
>>>> Mark
>>>>
>>>> On Thu, Jun 16, 2016 at 9:20 PM Mark Miller <[email protected]>
>>>> wrote:
>>>>
>>>>> That's already how things work. It's now part of HttpClient. There are
>>>>> some settings you can mess with. Is it easy to reproduce?
>>>>>
>>>>> Mark
>>>>> On Thu, Jun 16, 2016 at 1:15 PM Varun Thacker <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> When running a bulk index process occasionally we see a
>>>>>> NoHttpResponseException error when the leader is forwarding docs to the
>>>>>> replica. I think this is a known issue and can be reproduced pretty 
>>>>>> easily.
>>>>>>
>>>>>> What makes me want to dig more is that because of one such
>>>>>> NoHttpResponseException the leader will put the replica into recovery. 
>>>>>> The
>>>>>> replica can never catch up because the indexing throughput is quite high 
>>>>>> .
>>>>>> This can add hours of recovery time for the replica depending on how many
>>>>>> documents one is indexing .
>>>>>>
>>>>>> So from what I can think we have two options here -
>>>>>> 1. Implement a thread which removes stale connections. This has been
>>>>>> discussed on https://issues.apache.org/jira/browse/SOLR-4509 in the
>>>>>> past
>>>>>> 2. The above solution is not the right way forward. The main problem
>>>>>> here is that replicas can't catch up because Solr doesn't implement
>>>>>> backpressure yet and implementing that would be the correct solution here
>>>>>>
>>>>>> Does anyone have an opinion on how we should we go forward with this
>>>>>> issue?
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Varun Thacker
>>>>>>
>>>>> --
>>>>> - Mark
>>>>> about.me/markrmiller
>>>>>
>>>> --
>>>> - Mark
>>>> about.me/markrmiller
>>>>
>>>
>>>
>>>
>>> --
>>>
>>>
>>> Regards,
>>> Varun Thacker
>>>
>>
>>
>>
>> --
>>
>>
>> Regards,
>> Varun Thacker
>>
> --
> - Mark
> about.me/markrmiller
>



-- 


Regards,
Varun Thacker

Re: NoHttpResponseException error between leader and replica

Reply via email to