[ 
https://issues.apache.org/jira/browse/CASSANDRA-21026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18062106#comment-18062106
 ] 

Arup Chauhan edited comment on CASSANDRA-21026 at 3/2/26 11:08 AM:
-------------------------------------------------------------------

Hi Sam Tunnicliffe, thanks for the clarification, and sorry for the earlier 
off-target scope.

  I understand this issue is specifically about legitimate address reuse after 
node removal, where Accord startup fails while building endpoint mapping from 
Directory state.

  Proposed minimal repro:

  - Node A/B/C cluster with Accord enabled
  - Decommission node B (address \{{10.0.0.2}}, old node ID \{{X}})
  - Later bootstrap a new node at the same address \{{10.0.0.2}} with new node 
ID \{{Y}})
  - Startup fails with duplicate endpoint mapping (\{{Mapping already exists 
for ...}})

  I can take this in a focused scope:

  1. Reproduce with a minimal test/scenario
  2. Implement a targeted fix in 
\{{AccordTopology::directoryToEndpointMapping}} / \{{EndpointMapping.Builder}} 
for active vs removed nodes
  3. Add regression coverage to verify:

  - startup succeeds when an address is reused
  - removed-node IDs remain resolvable for inflight transaction completion

  If this matches your expectation, I’ll open a small PR against trunk and link 
it here.


was (Author: JIRAUSER312424):
Hi [~samt] , thanks for the clarification, and sorry again for the earlier 
off-target scope.

  I understand that this is about legitimate address reuse after node removal, 
and Accord startup failing when endpoint mapping is built from Directory state.

  I am thinking of a minimal scenario to reproduce:

  - Node A/B/C cluster with Accord enabled
  - Decommission node B (address \{{10.0.0.2}}, old node ID \{{X}})
  - Later bootstrap a new node at the same address \{{10.0.0.2}} with a new 
node ID \{{Y}}
  - On startup, Accord mapping build currently fails with duplicate endpoint 
mapping (\{{Mapping already exists for ...}})

  I can take this in a focused way:

  1. Reproduce with a minimal test/scenario
  2. Implement a targeted fix in 
\{{AccordTopology::directoryToEndpointMapping}} / \{{EndpointMapping.Builder}} 
handling of active vs removed nodes
  3. Add regression coverage to verify:
      - startup succeeds when an address is reused by a new node, and
      - removed-node IDs remain available/resolvable for inflight transaction 
completion

  If this matches what you had in mind, I’ll open a small PR against trunk and 
link it here.

> Reusing the address of a removed node is not possible with Accord enabled
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-21026
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21026
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Accord, Cluster/Membership
>            Reporter: Sam Tunnicliffe
>            Priority: Normal
>
> If the address of a decommissioned node is re-used by another new node at 
> some later time, any node in the cluster with Accord enabled will be unable 
> to start up, including the new node.
> As the new node comes up and registers with the {{ClusterMetadataService}} it 
> is added to the {{Directory}}.
> The decommissioned node's details are also preserved in the directory present 
> to ensure that transactions which were in-flight can be completed after the 
> node has left.
> (https://issues.apache.org/jira/browse/CASSANDRA-20142)
> During AccordService initialization building the endpoint mapping will fail 
> because of this check in {{EndpointMapping.Builder}}:
> {code}
> Invariants.requireArgument(!mapping.containsValue(endpoint), "Mapping already 
> exists for %s", endpoint);
> {code}
> Additionally, it seems possible that the wrong method is being called in 
> {{AccordTopology::directoryToEndpointMapping}}
> {code}
>        // There are cases where nodes are removed from the cluster (host 
> replacement, decom, etc.), but inflight events
>        // may still be happening; keep the ids around so pending events do 
> not fail with a mapping error
>        for (Directory.RemovedNode removedNode : directory.removedNodes())
>            builder.add(removedNode.endpoint, tcmIdToAccord(removedNode.id));
> {code}
> which should probably call {{builder::removed}} rather than {{builder::add}} 
> but that also contains the the same invariant check.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to