And to emphasize the brilliance of ZkShardTerms (in turn, on RAFT which is it's basis/genesis), we might not even need replica states. ZkShardTerms + live_nodes is probably enough. Credit to Dat on this; we were just in an email exchange about this stuff.
~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Fri, Oct 14, 2022 at 8:41 AM David Smiley <dsmi...@apache.org> wrote: > On Fri, Oct 14, 2022 at 6:11 AM Mark Miller <markrmil...@gmail.com> wrote: > >> I don’t have much to say about the proposal, other than to say that if an >> election ever ends up involving syncing up and exchanging data, doing that >> just in time is probably less than ideal for most of the more common uses >> cases. >> > > I should emphasize that replicas continue to want to sync-up with their > leaders on their own -- eagerly. See RecoveringCoreTermWatcher. I could > imagine an option to allow one to have this be lazy as well but I'm not > proposing that now. > > >> That’s just an aside though. Id be more interested in seeing the proposal >> connect problems with solutions. > > > The #1 point of the proposal is robustness/stability and not a specific > bug. Instead of wondering why there isn't a leader, this proposal would > log information about the pertinent state and elect a leader if one is > eligible. No wondering why one wasn't elected. Still, of course we might > wonder other things based on the logged information (of course), but you're > then closer to debugging what's happening. This is more user/operator > friendly. > > >> My quick read makes me think the goal is >> some dimension of scale (I’m guessing a lazy dimension, usually no the >> most >> common Solr architecture in my experience fwiw). But I don’t see what the >> problems are for that dimension of scale or how to connect proposals to >> solutions to the problems. Unless I’m just missing it. >> > > I think the proposal somewhat indirectly addresses a dimension of scale > involving tens of thousands of shards (2x more replicas) across a large > SolrCloud cluster, and using a simple node restart as an example. Assuming > replicas continue to try to be in sync based on ZkShardTerms (sorry I > wasn't clear on that), the actual choice of who is leader need not happen > eagerly; it's premature I say. > > BTW, the proposal's strategy is complementary with additional leader > algorithms being in-place, though I don't think we need both. The most > important mechanism is ZkShardTerms which is already in-place to govern > leader eligibility; it's brilliant. With that complicated problem already > solved, we're merely left with simply picking an eligible replica and doing > it (recording it in ZK) -- so just pick one already and be done with it :-) >