This has been submitted as a ticket on Kubernetes/Examples: https://github.com/kubernetes/examples/issues/149#issuecomment-347705608
Running Cassandra in AWS but I imagine the same applies elsewhere (Kubernetes 1.7.4). Consider: 3 Cassandra nodes as defined by the Statefulset. cassandra-0 is in AZ (zone) "a" cassandra-1 is in AZ (zone) "b" cassandra-2 is in AZ (zone) "c" AZ stands for a Availability Zone (same as Google Zone I heard). It's like a Cassandra rack (although we don't have Kubernetes Cassandra snitch so for Cassandra it looks like everything is in one rack) I shutdown all Kubernetes nodes in AZ "a" and prevent creating new nodes in AZ "a" (remove the zone from the AWS autoscaling group) Therefore cassandra-0 cannot start as its EBS Volume (PV) is tight to the AZ "a". This simulates 1 Availability Zone (zone) down. During this time cassandra-2 fails and needs to be restarted elsewhere. Or the host died. Or autoscaling killed the host node. A statefulset won't self-heal cassandra-2 and also prevents creating further cassandra-3, cassandra-4, etc. If we also loose cassandra-1 (maybe the host is gone because of AWS/Cloud autoscaling), then we have no Cassandra node left. The more time the first zone is down, the more we risk having to re-create the other Cassandra nodes (infra is elastic and hosts can go wrong), which is not working unless AZ "a" goes back online. For a 30minutes outage of AZ "a", it will be fine. But then the clock is ticking... >From Statefulset doc: "Before a scaling operation is applied to a Pod, all of its predecessors must be Running and Ready." Some questions Is that expected behaviour? shouldn't we allow Cassandra to scale nodes in other AZ even if the first one is down? Is Statefulset the right choice then? would deployment + replica work better? (as far as I know we need the seeds to start first, thereafter we don't care about the order). Should we use 3 statefulset (1 per zone/rack) joining the same cluster? How to do that? How do Cassandra node recognise they are in the same cluster? Is there any issue with this approach? -- You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscr...@googlegroups.com. To post to this group, send email to kubernetes-users@googlegroups.com. Visit this group at https://groups.google.com/group/kubernetes-users. For more options, visit https://groups.google.com/d/optout.