[
https://issues.apache.org/jira/browse/CASSANALYTICS-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18088191#comment-18088191
]
Yifan Cai commented on CASSANALYTICS-175:
-----------------------------------------
+1
> Bulk write jobs fail when a node returns with a different IP address
> --------------------------------------------------------------------
>
> Key: CASSANALYTICS-175
> URL: https://issues.apache.org/jira/browse/CASSANALYTICS-175
> Project: Apache Cassandra Analytics
> Issue Type: Bug
> Components: Writer
> Reporter: Jon Haddad
> Assignee: Jon Haddad
> Priority: Normal
> Time Spent: 10m
> Remaining Estimate: 0h
>
> During S3 bulk writes, CassandraTopologyMonitor polls the cluster topology
> every 5 seconds and cancels the job if the current topology is not equal to
> the topology captured at job start. The comparison is
> TokenRangeMapping.equals, which compares instance sets using
> RingInstance.equals — and RingInstance equality includes the node's IP
> address.
> A node that goes down and rejoins with a different IP address (routine in
> Kubernetes, where a rescheduled pod keeps its hostname and host ID but gets a
> new IP) is the same logical instance, with the same token ownership. The
> write remains correct and safe to continue. But because the IP participates
> in equality, the monitor reports "Topology changed during bulk write" and
> fails the job. On clusters with hundreds of nodes across multiple DCs, the
> probability of at least one pod replacement during a long-running job makes
> this a frequent, spurious failure mode.
> The monitor is not the only affected path:
> - {{RecordWriter.validateTaskTokenRangeMappings}} performs the same
> instance-set comparison on every executor task (both the direct and S3
> transports), so an IP change mid-job also fails task-level validation.
> - {{ReplicaAwareFailureHandler}} and {{ImportCompletionCoordinator}} key
> per-instance state by {{{}RingInstance{}}}; an instance observed under an old
> IP and a new IP is counted as two distinct replicas, skewing
> consistency-level accounting.
> History: {{RingInstance.equals}} originally compared token, fqdn, port, and
> datacenter. CASSANDRA-18852 added the IP address in the same change that
> introduced building {{RingInstance}} from {{{}ReplicaMetadata{}}}, which
> carries no token — leaving the IP as a stand-in discriminator.
> Fix: remove the IP address from RingInstance.equals/hashCode. Instance
> identity becomes clusterId, token, fqdn, rack, port, and datacenter. The
> remaining fields are sufficient to distinguish nodes: two live nodes cannot
> share fqdn + port + datacenter. Note that Sidecar resolves fqdn via reverse
> DNS and falls back to the IP string when resolution fails, so deployments
> without DNS see no behavior change; real topology changes (nodes added,
> removed, joining, leaving) are still detected through instance membership and
> pending-state comparison.
> One thing to be aware of: in a DNS-less environment the sidecar's fqdn
> fallback is the IP string, so this fix only helps deployments where reverse
> DNS gives stable names — which K8s does. Real topology changes (scale
> up/down, decommission, move) are still caught exactly as before.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]