Hi All, We recently encountered a scenario where clients were reading from previously active NN. The active namenode was running at almost 100% memory that the namenode was alive but in a constant GC loop. The zookeeper session expired and we have an automatic failover setup which automatically selects another standby namenode as active. Some clients who didn't detect the failover were still reading from previously active NN. We use QJM + ZKFC for co-ordinating automatic failover. We don't have fencing enabled in our cluster. We use the default fencing method (shell(/bin/true)) as our fencing configuration.
We run on AWS public cloud and use kubernetes for deployments. Do we have a fencing script for kubernetes deployments? Given that kubernetes is a tool adopted by many companies, I was thinking of adding a kubernetes-fencing script, if already not present. But I wanted to check with the community first. Please advise. Thank you