symat commented on a change in pull request #1251: ZOOKEEPER-3720: Fix rolling upgrade failure (invalid protocol version) URL: https://github.com/apache/zookeeper/pull/1251#discussion_r378790887
########## File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ########## @@ -416,10 +424,20 @@ private boolean startConnection(Socket sock, Long sid) throws IOException { // Sending id and challenge - // represents protocol version (in other words - message type) - dout.writeLong(PROTOCOL_VERSION); + // First sending the protocol version (in other words - message type). + // For backward compatibility reasons we stick to the old protocol version, unless the MultiAddress + // feature is enabled. During rolling upgrade, we must make sure that all the servers can + // understand the protocol version we use to avoid multiple partitions. see ZOOKEEPER-3720 + long protocolVersion = self.isMultiAddressEnabled() ? PROTOCOL_VERSION_3_6_0_MULTI_ADDRESS : PROTOCOL_VERSION_3_4_7; + dout.writeLong(protocolVersion); Review comment: OK, I did some deeper digging in the git history, this is what I found: The different version numbers / behaviors we had before: - protocol version originally introduced by ZOOKEEPER-107 in 3.5.0 - before 3.5.0, always a positive SID was sent in the beginning of the connection initiation message and there were no address information there. Even 3.4.14 still only sends a positive SID. - since ZOOKEEPER-107 (in 3.5.0+): version -65536L (0xffff0000) was sent, and on the receiver side: - if any positive (incl. 0) Long number received, then accept the number as SID and get the address from the own config - if -65536L was received, then parsing the message with the new format and use the address provided there - for any other negative numbers, we fail - in ZOOKEEPER-1633 (3.4.7), the receiver logic was making more clever to expect the SID as the second Long in the message, if the first Long was negative. So from 3.4.7 all the 3.4 servers can at least not die due to the new message formats, and rolling upgrades become possible without multiple partitions. However, all 3.4.X still sends only the positive SID and will not use any address information from the initial message (just reads the SID). This also means, that we can choose any negative number as new protocol version for the MultiAddress feature, and then the rolling upgrade from 3.4.7-3.4.14 to 3.6.0 could work theoretically even with MultiAddress enabled, as long as we always send the SID right after the protocol version. But the value of the protocol version is actually checked in 3.5.0, so we definitely need to disable MultiAddress to be able to get the rolling upgrade to work from 3.5. Long story short: we only have a single protocol version, -65536 (0xffff0000) as of now, plus the case when positive numbers are sent. The -65535 (0xffff0001) seems to me a good protocol version candidate for the multiaddress compatible protocol, since it is also only a single bit difference between the two. What do you think? But I just don't know how this whole thing should look in the later releases. Also this whole protocol version is now only about the initial message. We don't know if there is any other incompatibility between the other messages. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services