Brandon and I have been talking in CASSANDRA-18913 and here is the current 
plan; sharing for visibility

There are two bugs: 
1) restart and seeing a shutdown event before gossip has settled for you will 
create a partial EndpointState which leads to failed startup
2) shutdown corrupts state due to every node modifying their local copy (this 
may impact host replacements)

For #1 this can be fixed without changing networking, so will create a small 
patch for 4.x line
For #2 it does require a network change, so will add this to 5.0 with mixed 
mode support 

> On Oct 6, 2023, at 4:18 PM, David Capwell <dcapw...@apple.com> wrote:
> 
>> Won't the replacement have a newer generation?
> 
> The replacement is a different instance.  I performs a shadow round with its 
> seeds and if they are impacted by this issue then they are missing tokens, so 
> we fail the host replacement… you can work around this by changing the seeds 
> to nodes that know the token.
> 
>> I don't think it is, this is just fixing a gossip bug, and we should do so 
>> in all affected versions.
> 
> Right now we sent NoPayload which is 0 bytes, but with the change we send 
> GossipShutdown which contains the whole EndpointState… I “feel” like 4.x can 
> not handle this but worth a test (after deserializing the message we have 
> extra bytes… won’t we get mad about that?)… the gossip handler doesn’t look 
> at the payload so as long as 4.x serialization can support this, then it 
> won’t be hard to back port to 4.x
> 
> 
>> On Oct 6, 2023, at 3:57 PM, Brandon Williams <dri...@gmail.com> wrote:
>> 
>> On Fri, Oct 6, 2023 at 5:50 PM David Capwell <dcapw...@apple.com> wrote:
>>> Lets say you now need to host replace node1
>> 
>> Won't the replacement have a newer generation?
>> 
>>> avoid peers mutating endpoint states they don’t own
>> 
>> This sounds reasonable to me.
>> 
>>> This would be a protocol change, so would need to make sure everyone is 
>>> cool with me doing this in 5.0.>
>> 
>> I don't think it is, this is just fixing a gossip bug, and we should
>> do so in all affected versions.
>> 
>> Kind Regards,
>> Brandon
> 

Reply via email to