[
https://issues.apache.org/jira/browse/CASSANALYTICS-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18089404#comment-18089404
]
Michael Burman commented on CASSANALYTICS-175:
----------------------------------------------
> One thing to be aware of: in a DNS-less environment the sidecar's fqdn
> fallback is the IP string, so this fix only helps deployments where reverse
> DNS gives stable names — which K8s does.
This isn't actually true in the context of Sidecar. We ran into this problem in
our testing when FQDN reverseDNS parsing returns a different value each time.
This happens because in Kubernetes (especially CoreDNS) a PTR record is created
each time a headless service matches a Pod. Now, in our system we have multiple
headless services for certain pods, so they in return get multiple reverse DNS
records.
When the FQDN resolve is done by the Sidecar it only gets a single value from
the DNS server and that value can actually change in different calls. And as
such, there's a "topology change".
> Bulk write jobs fail when a node returns with a different IP address
> --------------------------------------------------------------------
>
> Key: CASSANALYTICS-175
> URL: https://issues.apache.org/jira/browse/CASSANALYTICS-175
> Project: Apache Cassandra Analytics
> Issue Type: Bug
> Components: Writer
> Reporter: Jon Haddad
> Assignee: Jon Haddad
> Priority: Normal
> Fix For: 0.5
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> During S3 bulk writes, CassandraTopologyMonitor polls the cluster topology
> every 5 seconds and cancels the job if the current topology is not equal to
> the topology captured at job start. The comparison is
> TokenRangeMapping.equals, which compares instance sets using
> RingInstance.equals — and RingInstance equality includes the node's IP
> address.
> A node that goes down and rejoins with a different IP address (routine in
> Kubernetes, where a rescheduled pod keeps its hostname and host ID but gets a
> new IP) is the same logical instance, with the same token ownership. The
> write remains correct and safe to continue. But because the IP participates
> in equality, the monitor reports "Topology changed during bulk write" and
> fails the job. On clusters with hundreds of nodes across multiple DCs, the
> probability of at least one pod replacement during a long-running job makes
> this a frequent, spurious failure mode.
> The monitor is not the only affected path:
> - {{RecordWriter.validateTaskTokenRangeMappings}} performs the same
> instance-set comparison on every executor task (both the direct and S3
> transports), so an IP change mid-job also fails task-level validation.
> - {{ReplicaAwareFailureHandler}} and {{ImportCompletionCoordinator}} key
> per-instance state by {{{}RingInstance{}}}; an instance observed under an old
> IP and a new IP is counted as two distinct replicas, skewing
> consistency-level accounting.
> History: {{RingInstance.equals}} originally compared token, fqdn, port, and
> datacenter. CASSANDRA-18852 added the IP address in the same change that
> introduced building {{RingInstance}} from {{{}ReplicaMetadata{}}}, which
> carries no token — leaving the IP as a stand-in discriminator.
> Fix: remove the IP address from RingInstance.equals/hashCode. Instance
> identity becomes clusterId, token, fqdn, rack, port, and datacenter. The
> remaining fields are sufficient to distinguish nodes: two live nodes cannot
> share fqdn + port + datacenter. Note that Sidecar resolves fqdn via reverse
> DNS and falls back to the IP string when resolution fails, so deployments
> without DNS see no behavior change; real topology changes (nodes added,
> removed, joining, leaving) are still detected through instance membership and
> pending-state comparison.
> One thing to be aware of: in a DNS-less environment the sidecar's fqdn
> fallback is the IP string, so this fix only helps deployments where reverse
> DNS gives stable names — which K8s does. Real topology changes (scale
> up/down, decommission, move) are still caught exactly as before.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]