When running a bulk index process occasionally we see a
NoHttpResponseException error when the leader is forwarding docs to the
replica. I think this is a known issue and can be reproduced pretty easily.

What makes me want to dig more is that because of one such
NoHttpResponseException the leader will put the replica into recovery. The
replica can never catch up because the indexing throughput is quite high .
This can add hours of recovery time for the replica depending on how many
documents one is indexing .

So from what I can think we have two options here -
1. Implement a thread which removes stale connections. This has been
discussed on https://issues.apache.org/jira/browse/SOLR-4509 in the past
2. The above solution is not the right way forward. The main problem here
is that replicas can't catch up because Solr doesn't implement backpressure
yet and implementing that would be the correct solution here

Does anyone have an opinion on how we should we go forward with this issue?



-- 


Regards,
Varun Thacker

Reply via email to