Alan Woodward commented on SOLR-9512:

Right, there are two scenarios:
1. The leader is down, and there's no replacement voted for yet, in which case 
things happen much as you describe above
2. The old leader is down, a new leader has been selected, but the cache hasn't 
updated yet.  In this case the update actually succeeds, as it's passed to the 
next node in the list and then forwarded on to the relevant leader.
In both cases, we need to invalidate the cache.

Separately, there's a bit of cleanup we can do in the directUpdate() method 
call - at the moment we have two paths, dependent on whether or not we're using 
parallel updates or not, and they end up doing things like throwing slightly 
different exceptions for the same failure types.  I'll open up another JIRA for 

> CloudSolrClient's cluster state cache can break direct updates to leaders
> -------------------------------------------------------------------------
>                 Key: SOLR-9512
>                 URL: https://issues.apache.org/jira/browse/SOLR-9512
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Alan Woodward
> This is the root cause of SOLR-9305 and (at least some of) SOLR-9390.  The 
> process goes something like this:
> Documents are added to the cluster via a CloudSolrClient, with 
> directUpdatesToLeadersOnly set to true.  CSC caches its view of the 
> DocCollection.  The leader then goes down, and is reassigned.  Next time 
> documents are added, CSC checks its cache again, and gets the old view of the 
> DocCollection.  It then tries to send the update directly to the old, now 
> down, leader, and we get ConnectionRefused.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to