Do typical setups (users) of SolrCloud not care about no data loss in the cluster, or are clusters maintained stable enough that these issues do not happen? I feel moving to a cloud infra drives more instability and suspect these issues becoming more prevalent/problematic then.
Indeed Terms (LIR) and shard leader election serve the same purpose. Maybe one could go... I'd like to have a cluster where some replicas are not loaded in memory (transient cores) and do not even participate in all the ZK activity until they're needed. This would open the door to a ridiculously high number of replicas (limited by disk size, not by memory, CPU, connections, threads or anything else). This would serve well a search hosting use case where most tenants are not active at any given time, but there are many of them. I don't know if it's a realistic evolution of SolrCloud or should be considered science fiction at this stage. Ilan On Tue, Oct 5, 2021 at 7:33 AM Mark Miller <[email protected]> wrote: > Well, the replicas are still waiting for the leader, so not no wait, you > just don’t have leaders waiting for full shards that lessens the problem. A > leader should have no wait and release any waiting replicas. > > That core sorter should be looking at LIR to start the leader capable > replicas first. > > Mark > > On Tue, Oct 5, 2021 at 12:10 AM Mark Miller <[email protected]> wrote: > >> >> >> On Mon, Oct 4, 2021 at 5:24 AM Ilan Ginzburg <[email protected]> wrote: >> >>> Thanks Mark for your write ups! This is an area of SolrCloud I'm >>> currently actively exploring at work (might publish my notes as well at >>> some point). >>> >>> I think terms value (fitness to become leader) should participate in the >>> election node ordering, as well as a terms goal (based on highest known >>> term for the shard), and clear options of stalled election vs. data loss >>> (and if data loss is not acceptable, an external process automated or human >>> likely has to intervene to unblock the stalled election). >>> >> >> There should only be a stalled election if no one can become leader for >> some odd reason - agreed it would be good to be able to detect that vs >> cycling forever. >> >> I’ve also always thought that should probably be a configuration. If you >> know a replica is absent with more data you just bail (currently it will >> wait and then continue) or an operator could configure to continue with the >> data you have regardless. >> >> I used to think about those things, but it’s a boring area to care about >> or work on given no one else does. >> >> >>> Even if all updates hit multiple replicas, nothing guarantees that any >>> of these copies is present when another replica (without the update) >>> starts. If we don't want to wait at startup for other replicas to join an >>> election (this can't scale even though CoreSorter does its best... but >>> is the most convoluted Comparator I've ever seen) we might need the >>> notion of "incomplete leader", i.e. a replica that is the current elected >>> leader but that does not have all data (at some later point we might decide >>> to accept the loss and consider it's the leader, or when a better >>> positioned replica joins, have it become leader). This will require quite >>> some assumptions revisiting, so likely should be associated with a >>> thorough clean up (and a move to Curator election?). >>> >> >> A new replica that comes up should fit into the above logic - it will >> have a term of 0 and other replicas will have higher terms and you will >> know from zk - so either you fail shard startup or like today, the other >> replicas are not coming and you continue on. >> >> I don’t think that core sorter does the best sort based on the last time >> I looked at it. >> >> It used to actually matter much more though - because the shard could not >> start until all the replicas were up, so with many replicas and shards, and >> collections, depending on order you could easily have to start a huge >> number of cores to complete a shard before it got moving. >> >> But that issue should not be nearly the issue that it was. That’s again >> from a world with no LIR info. There should be no reason to wait for all >> replicas today, no reason to have them all involved in a sync. Any replica >> with the highest LIR term can become leader, so sorting to complete shards >> might be nice, but it should not be needed like it was. (It still is >> needed, but again because we still wait when you don’t need to). >> >> Technically, you could do a much simpler leader election. With an >> overseer, it could just pick the leader. With or without, a replica that >> see it has the highest term could just try and create the leader node - >> first one wins. >> >> The current leader election is a recipe zk promotes to avoid a thundering >> herd affect - you can have tons of participants and it’s an efficient flow >> vs 100 participants fighting to see who creates a zk node every new >> election. >> >> But generally we have 3 replicas. Some outlier users might use more, but >> even still it’s not going to be that many. >> >> Mark >> >> >>> Ilan >>> >>> >>> >>> On Sun, Oct 3, 2021 at 4:27 AM Mark Miller <[email protected]> >>> wrote: >>> >>>> I filed >>>> https://issues.apache.org/jira/browse/SOLR-15672 Leader Election is >>>> flawed - for future reference if anyone looks at tackling leader election >>>> issues. I’ll drop a couple notes and random suggestions there >>>> >>>> Mark >>>> >>>> On Sat, Oct 2, 2021 at 12:47 PM Mark Miller <[email protected]> >>>> wrote: >>>> >>>>> At some point digging through some of this stuff, I often start to >>>>> think, I wonder how good our tests are at catching certain categories of >>>>> problems. Various groups of code branches and behaviors. >>>>> >>>>> I do notice that as I get the test flying, they do start to pick up a >>>>> lot more issues. A lot more bugs and bad behavior. And as they start to >>>>> near max out, I start feeling a little better about a lot of it. But then >>>>> I’m looking at things outside of tests still as well. Using my own tools >>>>> and setups, using stuff from others. Being cruel in my expectations. And >>>>> by >>>>> then I’ve come a long way, but I can still find problems. Run into bad >>>>> situations. If I push, and when I make it so can push harder, i push even >>>>> harder. And I want the damn thing solid. Why come all this way if I can’t >>>>> have really and truly solid. And that’s when I reach for collection >>>>> creation a mixed with cluster restarts. How about I shove 100000 >>>>> SolrCores >>>>> 10000 collections right down its mouth on a handful of instances on a >>>>> single machine in like a minute timeframe. How about 30 seconds. How about >>>>> more collections. How about lower time frames. Vary things around. Let’s >>>>> just swamp it and demand the setup eats it in silly time frames and stands >>>>> up at the end correct and happy. And then I start to get to the bottom of >>>>> the barrel on what’s subverting my solidness. But as I’ve always said, >>>>> more >>>>> and more targeted for tests along with simpler and more understandable >>>>> implementations will also cover a lot more ground. I certainly have pushed >>>>> on simpler implementations. I’ve never gotten to the point where I have >>>>> the >>>>> energy and time to just push on more, better and more targeted tests, more >>>>> unit tests, more mockito, more awaitability as Tims suggested, etc. >>>>> -- >>>>> - Mark >>>>> >>>>> http://about.me/markrmiller >>>>> >>>> -- >>>> - Mark >>>> >>>> http://about.me/markrmiller >>>> >>> -- >> - Mark >> >> http://about.me/markrmiller >> > -- > - Mark > > http://about.me/markrmiller >
