dsmiley commented on PR #1484:
URL: https://github.com/apache/solr/pull/1484#issuecomment-1482259248
CHANGES.txt suggestion as an Improvement (JIRA also shows as Improvement):
* SOLR-11685: When SolrCloud shard leaders change while indexing updates
arrive, Solr could fail and return
a HTTP 503 status. Switched to 510 so that CloudSolrClient will
auto-retry it and probably succeed.
Based on the errors from some rare flapping tests, I believe this can be
just an improvement. But I have not encountered the issue in this way to be
honest, I see this in a serious bug form that I might describe as follows:
* SOLR-11685: When SolrCloud shard leaders change while indexing updates
arrive, Solr could return
a success to a client when it actually failed to accept it.
In the first (just an improvement), it's likely the initial Solr node had
the leader flip confusion, but in the second (a bug) it happens when the
initial Solr node has to forward the message to another node that is the leader
(but doesn't quite know it yet). I'm debugging more to clarify the impact of
the bug with only this change, and very likely another bug for a more general
case that would probably deserve another JIRA or we fold into this one to
clarify the messaging to users.
I could imagine a test we could beast that induces ZooKeeper session losses
and thus Solr side shard leadership changes while indexing is coming in,
constantly checking if each doc _actually_ makes it. Some of the chaos tests
show how to do the session loss trick.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]