[ https://issues.apache.org/jira/browse/ZOOKEEPER-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17037667#comment-17037667 ]
Mate Szalay-Beko edited comment on ZOOKEEPER-2164 at 2/15/20 11:56 PM: ----------------------------------------------------------------------- [~suhas.dantkale] Actually, after adding some extra logs and analyzing them, I realized that the issue I found before and reproduced is indeed caused by the 0.0.0.0 addresses (I just mixed up the configs, and indeed I used wildcard addresses in the config files). Sorry for misleading you... Your root cause analysis is totally correct. I have a fix that solves this issue. It is actually quite easy... sending the address in in the initial message was introduced in 3.5.0 (ZOOKEEPER-107) and the 3.4 versions never used this field. And still in 3.5 for backward compatibility reasons (needed during rolling upgrade) there is a version of the {{QuorumCnxManager.connectOne()}} that needs no election address but use the last known address to initiate the connection. So the solution can be simply to call this method if the address is a wildcard address (0.0.0.0). It can simply be verified using {{InetAddress.isAnyLocalAddress()}}. Still, we have to verify if this change is compatible with the dynamic reconfig (I think it is) and also works with rolling upgrade. (I also had the idea to not even send the 0.0.0.0 in the first place, but then I think we would hit parsing errors during rolling upgrades, so the best is to still send it, just filter out in the receiver side.) Also the same change will not work both on the 3.5 and 3.6 branches, as we have the MultiAddress feature added for 3.6 and we use a slightly different message format / internal representation of addresses. Anyway, as you were the one found this issue in the first place, let me know if you wish to take it over and work on it. I think it is a change that will require some discussion within the community. Otherwise I will push my PR and do the rest of the work. BTW: I don't think that this would be something that can be verified by unit tests. Even using 0.0.0.0 in the unit tests would always work (would be similar to 127.0.0.1), as we are executing everything on a single machine. Still, it is a question for me if this ticket was originally about this issue or not. Some of the comments seems to indicate that people were hitting the 0.0.0.0 issues, but in the original description ZooKeeper 3.4.5 was mentioned, and that can not be the the issue you and I were talking here. I still have to look into that. was (Author: symat): [~suhas.dantkale] Actually, after adding some extra logs and analyzing them, I realized that the issue I found before and reproduced is indeed caused by the 0.0.0.0 addresses (I just mixed up the configs, and indeed I used wildcard addresses in the config files). Sorry for misleading you... Your root cause analysis is totally correct. I have a fix that solves this issue.(it is quite easy... actually sending the address in in the initial message was introduced in 3.5.0 (ZOOKEEPER-107) and the 3.4 versions never used this field. And still in 3.5 for backward compatibility reasons there is a version of the {{QuorumCnxManager.connectOne()}} that needs no election address but use the last known address to initiate the connection. So the solution can be simply to call this method if the address is a wildcard address (0.0.0.0). It can simply be verified using {{InetAddress.isAnyLocalAddress()}}. Still, we have to verify if this change is compatible with the dynamic reconfig (I think it is) and also works with rolling upgrade. (I also had the idea to not even send the 0.0.0.0 in the first place, but then I think we would hit parsing errors during rolling upgrades, so the best is to still send it, just filter out in the receiver side.) Also the same change will not work both on the 3.5 and 3.6 branches, as we have the MultiAddress feature added for 3.6 and we use a slightly different message format / internal representation of addresses. Anyway, as you were the one found this issue in the first place, let me know if you wish to take it over and work on it. I think it is a change that will require some discussion within the community. Otherwise I will push my PR and do the rest of the work. BTW: I don't think that this would be something that can be verified by unit tests. Even using 0.0.0.0 in the unit tests would always work (would be similar to 127.0.0.1), as we are executing everything on a single machine. Still, it is a question for me if this ticket was originally about this issue or not. Some of the comments seems to indicate that people were hitting the 0.0.0.0 issues, but in the original description ZooKeeper 3.4.5 was mentioned, and that can not be the the issue you and I were talking here. I still have to look into that. > fast leader election keeps failing > ---------------------------------- > > Key: ZOOKEEPER-2164 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2164 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection > Affects Versions: 3.4.5 > Reporter: Michi Mutsuzaki > Assignee: Mate Szalay-Beko > Priority: Major > Fix For: 3.7.0, 3.5.8 > > > I have a 3-node cluster with sids 1, 2 and 3. Originally 2 is the leader. > When I shut down 2, 1 and 3 keep going back to leader election. Here is what > seems to be happening. > - Both 1 and 3 elect 3 as the leader. > - 1 receives votes from 3 and itself, and starts trying to connect to 3 as a > follower. > - 3 doesn't receive votes for 5 seconds because connectOne() to 2 doesn't > timeout for 5 seconds: > https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java#L346 > - By the time 3 receives votes, 1 has given up trying to connect to 3: > https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/Learner.java#L247 > I'm using 3.4.5, but it looks like this part of the code hasn't changed for a > while, so I'm guessing later versions have the same issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)