One issue I see is that a TLOG non-leader can't simply be declared a leader at any time, just because it's up to date via ZkShardTerms. It will need to replay its updateLog first. I don't think that can be delayed till when it receives a doc to index because queries assume a leader is up to date, certainly for RTG at least. The algorithm I listed could be changed such that the chosen replica is informed so that it can do the needful itself. This also acts as an implicit extra liveness check that the chosen leader is reachable.
~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Fri, Oct 14, 2022 at 8:46 AM David Smiley <dsmi...@apache.org> wrote: > And to emphasize the brilliance of ZkShardTerms (in turn, on RAFT which is > it's basis/genesis), we might not even need replica states. ZkShardTerms + > live_nodes is probably enough. Credit to Dat on this; we were just in an > email exchange about this stuff. > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Fri, Oct 14, 2022 at 8:41 AM David Smiley <dsmi...@apache.org> wrote: > >> On Fri, Oct 14, 2022 at 6:11 AM Mark Miller <markrmil...@gmail.com> >> wrote: >> >>> I don’t have much to say about the proposal, other than to say that if an >>> election ever ends up involving syncing up and exchanging data, doing >>> that >>> just in time is probably less than ideal for most of the more common uses >>> cases. >>> >> >> I should emphasize that replicas continue to want to sync-up with their >> leaders on their own -- eagerly. See RecoveringCoreTermWatcher. I could >> imagine an option to allow one to have this be lazy as well but I'm not >> proposing that now. >> >> >>> That’s just an aside though. Id be more interested in seeing the proposal >>> connect problems with solutions. >> >> >> The #1 point of the proposal is robustness/stability and not a specific >> bug. Instead of wondering why there isn't a leader, this proposal would >> log information about the pertinent state and elect a leader if one is >> eligible. No wondering why one wasn't elected. Still, of course we might >> wonder other things based on the logged information (of course), but you're >> then closer to debugging what's happening. This is more user/operator >> friendly. >> >> >>> My quick read makes me think the goal is >>> some dimension of scale (I’m guessing a lazy dimension, usually no the >>> most >>> common Solr architecture in my experience fwiw). But I don’t see what the >>> problems are for that dimension of scale or how to connect proposals to >>> solutions to the problems. Unless I’m just missing it. >>> >> >> I think the proposal somewhat indirectly addresses a dimension of scale >> involving tens of thousands of shards (2x more replicas) across a large >> SolrCloud cluster, and using a simple node restart as an example. Assuming >> replicas continue to try to be in sync based on ZkShardTerms (sorry I >> wasn't clear on that), the actual choice of who is leader need not happen >> eagerly; it's premature I say. >> >> BTW, the proposal's strategy is complementary with additional leader >> algorithms being in-place, though I don't think we need both. The most >> important mechanism is ZkShardTerms which is already in-place to govern >> leader eligibility; it's brilliant. With that complicated problem already >> solved, we're merely left with simply picking an eligible replica and doing >> it (recording it in ZK) -- so just pick one already and be done with it :-) >> >