I’m a big +1 on 18917 or more testing of gossip. While I appreciate that it
makes TCM more complicated, gossip and schema propagation bugs have been
the source of our two worst data loss events in the last 3 years. Data loss
should immediately cause us to evaluate what we can do better.

We will likely live with gossip for at least 1, maybe 2, more years.
Otherwise outside of bug fixes (and to some degree even still) I think the
only other solution is to not touch gossip *at all* until we are all
TCM-only which I don’t think is practical or realistic. recent changes to
gossip in 4.1 introduced several subtle bugs that had serious impact (from
data loss to loss of ability to safely replace nodes in the cluster).

I am happy to contribute some time to this if lack of folks is the issue.

Jordan

On Mon, May 13, 2024 at 17:05 David Capwell <dcapw...@apple.com> wrote:

> So, I created https://issues.apache.org/jira/browse/CASSANDRA-18917 which
> lets you do deterministic gossip simulation testing cross large clusters
> within seconds… I stopped this work as it conflicted with TCM (they were
> trying to merge that week) and it hit issues where some nodes never
> converged… I didn’t have time to debug so I had to drop the patch…
>
> This type of change would be a good reason to resurrect that patch as
> testing gossip is super dangerous right now… its behavior is only in a few
> peoples heads and even then its just bits and pieces scattered cross
> multiple people (and likely missing pieces)…
>
> My brain is far too fried right now to say your idea is safe or not, but
> honestly feel that we would need to improve our tests (we have 0) before
> making such a change…
>
> I do welcome the patch though...
>
>
> On May 12, 2024, at 8:05 PM, Zemek, Cameron via dev <
> dev@cassandra.apache.org> wrote:
>
> In looking into CASSANDRA-19580 I noticed something that raises a
> question. With Gossip SYN it doesn't check for missing digests. If its
> empty for shadow round it will add everything from endpointStateMap to the
> reply. But why not included missing entries in normal replies? The
> branching for reply handling of SYN requests could then be merged into
> single code path (though shadow round handles empty state different with
> CASSANDRA-16213). Potential is performance impact as this requires doing a
> set difference.
>
> For example, something along the lines of:
>
> ```
>         Set<InetAddressAndPort> missing = new
> HashSet<>(endpointStateMap.keySet());
>
> missing.removeAll(gDigestList.stream().map(GossipDigest::getEndpoint).collect(Collectors.toSet()));
>         for ( InetAddressAndPort endpoint : missing)
>         {
>             gDigestList.add(new GossipDigest(endpoint, 0, 0));
>         }
> ```
>
> It seems odd to me that after shadow round for a new node we have
> endpointStateMap with only itself as an entry. Then the only way it gets
> the gossip state is by another node choosing to send the new node a gossip
> SYN. The choosing of this is random. Yeah this happens every second so
> eventually its going to receive one (outside the issue of CASSANDRA-19580
> were it doesn't if its in a dead state like hibernate) , but doesn't this
> open up bootstrapping to failures on very large clusters as it can take
> longer before its sent a SYN (as the odds of being chosen for SYN get
> lower)? For years been seeing bootstrap failures with 'Unable to contact
> any seeds' but they are infrequent and never been able to figure out how to
> reproduce in order to open a ticket, but I wonder if some of them have been
> due to not receiving a SYN message before it does the seenAnySeed check.
>
>
>

Reply via email to