Noble Paul commented on SOLR-9512:

Does it make any sense to have the explicit flag {{directUpdatesToLeaders}} ? 
IMHO it should be the only supported behavior. Why would we ever send a request 
ever to another node when the shard leader is down?

My proposal is as follows.

* The LBHttpSolrClient is aware of down servers. So, if the leader for the 
shard is down we already know it and we can fail fast
* If we have to fail because the shard leader is dead, we should try to 
explicitly read the {{state.json}} for that collection provided the cached 
state is older than a certain threshold (say 1 sec?). This means, the client 
may fire a request to ZK to every second to refresh the state. For such 
refreshes, check the version first to optimize ZK reads

> CloudSolrClient's cluster state cache can break direct updates to leaders
> -------------------------------------------------------------------------
>                 Key: SOLR-9512
>                 URL: https://issues.apache.org/jira/browse/SOLR-9512
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Alan Woodward
>         Attachments: SOLR-9512.patch
> This is the root cause of SOLR-9305 and (at least some of) SOLR-9390.  The 
> process goes something like this:
> Documents are added to the cluster via a CloudSolrClient, with 
> directUpdatesToLeadersOnly set to true.  CSC caches its view of the 
> DocCollection.  The leader then goes down, and is reassigned.  Next time 
> documents are added, CSC checks its cache again, and gets the old view of the 
> DocCollection.  It then tries to send the update directly to the old, now 
> down, leader, and we get ConnectionRefused.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to