Paolo Patierno created ZOOKEEPER-4708:
-----------------------------------------

             Summary: ZooKeeper 3.6.4 quorum failing due to <unresolved> address
                 Key: ZOOKEEPER-4708
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4708
             Project: ZooKeeper
          Issue Type: Bug
    Affects Versions: 3.8.1, 3.6.4
            Reporter: Paolo Patierno


We work on the Strimzi project which is about deploying an Apache Kafka cluster 
on Kubernetes together with a ZooKeeper ensamble.

Until ZooKeeper version 3.6.3 (brought by Kafka 3.4.0), there were no issues 
when running on minikube for development purposes.

With using ZooKeeper version 3.6.4 (brought by Kafka 3.4.1), we started to have 
issues during the quorum formation and leader election.

The first one was about ZooKeeper pods not able to bind the quorum port 3888 to 
the Cluster IP but during the DNS resolution they get the loopback address 
instead.
Following a possible log at ZooKeeper startup where you can see the binding at 
127.0.0.1:3888 instead of something like 172.17.0.4:3888 (so getting a valid 
not loopback IP address).
INFO 3 is accepting connections now, my election bind port: 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/127.0.0.1:3888 
(org.apache.zookeeper.server.quorum.QuorumCnxManager) 
[ListenerHandler-my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/127.0.0.1:3888]
This specific issue had two solutions: using quorumListenOnAllIPs=true on 
ZooKeeper configuration or binding to 0.0.0.0 address.

Anyway it is actually not clear why it wasn't needed until 3.6.3, but needed 
for getting 3.6.4 working. What is changed from this perspective?

Said that, While binding to 0.0.0.0 seems to work fine, using the 
quorumListenOnAllIPs=true doesn't.

Assuming a ZooKeeper ensamble with 3 nodes, Getting the log of the current 
ZooKeeper leader (ID=3) we see the following.
(Starting with ** you can see some additional logs added to 
{{org.apache.zookeeper.server.quorum.Leader#getDesignatedLeader}} in order to 
get more information.)
2023-06-19 12:32:51,990 INFO Have quorum of supporters, sids: [[1, 3],[1, 3]]; 
starting up and setting last processed zxid: 0x100000000 
(org.apache.zookeeper.server.quorum.Leader) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,990 INFO ** 
newQVAcksetPair.getQuorumVerifier().getVotingMembers().get(self.getId()).addr = 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/172.17.0.6:2888 
(org.apache.zookeeper.server.quorum.Leader) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,990 INFO ** self.getQuorumAddress() = 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/<unresolved>:2888 
(org.apache.zookeeper.server.quorum.Leader) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,992 INFO ** qs.addr 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/172.17.0.6:2888, 
qs.electionAddr 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/172.17.0.6:3888, 
qs.clientAddr/127.0.0.1:12181 (org.apache.zookeeper.server.quorum.QuorumPeer) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,992 DEBUG zookeeper (org.apache.zookeeper.common.PathTrie) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,993 WARN Restarting Leader Election 
(org.apache.zookeeper.server.quorum.QuorumPeer) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
So the leader is ZooKeeper with ID=3 and it was ACKed by the ZooKeeper node 
ID=1.
As you can see we are in the {{Leader#startZkServer}} method, and because of 
the reconfiguration enabled, the designatedLeader is processed. The problem is 
that the {{Leader#getDesignatedLeader}} is not returning “self” as leader but 
another one (ID=1), because of the difference in the quorum address.
>From the above log, it’s not an actual difference in terms of addresses but 
>the {{self.getQuorumAddress()}} is returning an <unresolved> (even if it’s 
>still the same hostname related to ZooKeeper-2 instance). This difference 
>causes the allowedToCommit=false, meanwhile the ZooKeeper-2 is still reported 
>as leader but it’s not able to commit, so prevents any requests and the 
>ZooKeeper ensemble gets stuck.
2023-06-19 12:32:51,996 WARN Suggested leader: 1 
(org.apache.zookeeper.server.quorum.QuorumPeer) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,996 WARN This leader is not the designated leader, it will 
be initialized with allowedToCommit = false 
(org.apache.zookeeper.server.quorum.Leader) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
The overall issue could be related to DNS problems, with DNS records not 
registered yet during pod initialization (where ZooKeeper is running on 
Kubernetes). But we don’t understand why it’s not able to recover somehow.

What we don't get a reason is why ZooKeeper 3.6.3 didn't need any binding 
specific configuration and was working just fine, while the new 3.6.4 needs 
that. Other than that, why we see the above described issue after the binding 
works fine.

We also tried by using latest ZooKeeper 3.8.1 facing the same issue.

It's worth mentioning that the issue wasn't reproducible in environment like 
AKS or OpenShift, but if it's somehow slow DNS related, it could happen anyway 
(even not on minikube as in our use case).

FYI, the discussion we had in the upstream Strimzi community 
https://github.com/strimzi/strimzi-kafka-operator/issues/8675



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to