Re: NoHttpResponseException error between leader and replica

Mark Miller Fri, 17 Jun 2016 04:56:51 -0700

Ah, forgot to mention, it's only on 7x.

Mark
On Fri, Jun 17, 2016 at 5:06 AM Varun Thacker <[email protected]>
wrote:


> bq. It's now part of HttpClient.
>
> Were you referring to Line230 of HttpClientUtil on master ? - 
> cm.setValidateAfterInactivity(Integer.getInteger(VALIDATE_AFTER_INACTIVITY,
> VALIDATE_AFTER_INACTIVITY_DEFAULT));
>
> On Fri, Jun 17, 2016 at 12:13 PM, Varun Thacker <
> [email protected]> wrote:
>
>> Hi Mark,
>>
>> We were running Solr 5.4.1 on a 4 node machine and a 2 shard 2 replica
>> collection.
>> The test data is roughly 30M large documents. The indexing process is via
>> map-reduce and there are 80 parallel reducers sending a batch of 500
>> documents to solr at a go.
>>
>> In this setup almost all runs hit the NoHttpResponseException b/w leader
>> and replica once.
>>
>> "It's now part of HttpClient." - Sorry I didn't quite follow whats part
>> of HttpClient?
>>
>>
>>
>> On Fri, Jun 17, 2016 at 6:51 AM, Mark Miller <[email protected]>
>> wrote:
>>
>>> I'm sorry, you say it's easy to reproduce, but can you explain roughly
>>> what you are doing to reproduce it?
>>>
>>> Mark
>>>
>>> On Thu, Jun 16, 2016 at 9:20 PM Mark Miller <[email protected]>
>>> wrote:
>>>
>>>> That's already how things work. It's now part of HttpClient. There are
>>>> some settings you can mess with. Is it easy to reproduce?
>>>>
>>>> Mark
>>>> On Thu, Jun 16, 2016 at 1:15 PM Varun Thacker <
>>>> [email protected]> wrote:
>>>>
>>>>> When running a bulk index process occasionally we see a
>>>>> NoHttpResponseException error when the leader is forwarding docs to the
>>>>> replica. I think this is a known issue and can be reproduced pretty 
>>>>> easily.
>>>>>
>>>>> What makes me want to dig more is that because of one such
>>>>> NoHttpResponseException the leader will put the replica into recovery. The
>>>>> replica can never catch up because the indexing throughput is quite high .
>>>>> This can add hours of recovery time for the replica depending on how many
>>>>> documents one is indexing .
>>>>>
>>>>> So from what I can think we have two options here -
>>>>> 1. Implement a thread which removes stale connections. This has been
>>>>> discussed on https://issues.apache.org/jira/browse/SOLR-4509 in the
>>>>> past
>>>>> 2. The above solution is not the right way forward. The main problem
>>>>> here is that replicas can't catch up because Solr doesn't implement
>>>>> backpressure yet and implementing that would be the correct solution here
>>>>>
>>>>> Does anyone have an opinion on how we should we go forward with this
>>>>> issue?
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>> Regards,
>>>>> Varun Thacker
>>>>>
>>>> --
>>>> - Mark
>>>> about.me/markrmiller
>>>>
>>> --
>>> - Mark
>>> about.me/markrmiller
>>>
>>
>>
>>
>> --
>>
>>
>> Regards,
>> Varun Thacker
>>
>
>
>
> --
>
>
> Regards,
> Varun Thacker
>
-- 
- Mark
about.me/markrmiller

Re: NoHttpResponseException error between leader and replica

Reply via email to