I have also recently worked with a teams who lost critical data as a result of gossip issues combined with collision in our token allocation. I haven’t filed a jira yet as it slipped my mind but I’ve seen it in my own testing as well. I’ll get a JIRA in describing it in detail.
It’s severe enough that it should probably block 5.0. Jon On Thu, May 16, 2024 at 10:37 AM Jordan West <jw...@apache.org> wrote: > I’m a big +1 on 18917 or more testing of gossip. While I appreciate that > it makes TCM more complicated, gossip and schema propagation bugs have been > the source of our two worst data loss events in the last 3 years. Data loss > should immediately cause us to evaluate what we can do better. > > We will likely live with gossip for at least 1, maybe 2, more years. > Otherwise outside of bug fixes (and to some degree even still) I think the > only other solution is to not touch gossip *at all* until we are all > TCM-only which I don’t think is practical or realistic. recent changes to > gossip in 4.1 introduced several subtle bugs that had serious impact (from > data loss to loss of ability to safely replace nodes in the cluster). > > I am happy to contribute some time to this if lack of folks is the issue. > > Jordan > > On Mon, May 13, 2024 at 17:05 David Capwell <dcapw...@apple.com> wrote: > >> So, I created https://issues.apache.org/jira/browse/CASSANDRA-18917 which >> lets you do deterministic gossip simulation testing cross large clusters >> within seconds… I stopped this work as it conflicted with TCM (they were >> trying to merge that week) and it hit issues where some nodes never >> converged… I didn’t have time to debug so I had to drop the patch… >> >> This type of change would be a good reason to resurrect that patch as >> testing gossip is super dangerous right now… its behavior is only in a few >> peoples heads and even then its just bits and pieces scattered cross >> multiple people (and likely missing pieces)… >> >> My brain is far too fried right now to say your idea is safe or not, but >> honestly feel that we would need to improve our tests (we have 0) before >> making such a change… >> >> I do welcome the patch though... >> >> >> On May 12, 2024, at 8:05 PM, Zemek, Cameron via dev < >> dev@cassandra.apache.org> wrote: >> >> In looking into CASSANDRA-19580 I noticed something that raises a >> question. With Gossip SYN it doesn't check for missing digests. If its >> empty for shadow round it will add everything from endpointStateMap to the >> reply. But why not included missing entries in normal replies? The >> branching for reply handling of SYN requests could then be merged into >> single code path (though shadow round handles empty state different with >> CASSANDRA-16213). Potential is performance impact as this requires doing a >> set difference. >> >> For example, something along the lines of: >> >> ``` >> Set<InetAddressAndPort> missing = new >> HashSet<>(endpointStateMap.keySet()); >> >> missing.removeAll(gDigestList.stream().map(GossipDigest::getEndpoint).collect(Collectors.toSet())); >> for ( InetAddressAndPort endpoint : missing) >> { >> gDigestList.add(new GossipDigest(endpoint, 0, 0)); >> } >> ``` >> >> It seems odd to me that after shadow round for a new node we have >> endpointStateMap with only itself as an entry. Then the only way it gets >> the gossip state is by another node choosing to send the new node a gossip >> SYN. The choosing of this is random. Yeah this happens every second so >> eventually its going to receive one (outside the issue of CASSANDRA-19580 >> were it doesn't if its in a dead state like hibernate) , but doesn't this >> open up bootstrapping to failures on very large clusters as it can take >> longer before its sent a SYN (as the odds of being chosen for SYN get >> lower)? For years been seeing bootstrap failures with 'Unable to contact >> any seeds' but they are infrequent and never been able to figure out how to >> reproduce in order to open a ticket, but I wonder if some of them have been >> due to not receiving a SYN message before it does the seenAnySeed check. >> >> >>