Hi,
even if we had enough binding +1 on 3.6.0rc2 before closing the VOTE
of 3.6.0 I wanted to finish my tests and I am coming to an apparent
blocker.

I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it looks like
peers are not able to talk to each other.
I have a cluster of 3, server1, server2 and server3.
When I upgrade server1 to 3.6.0rc2 I see this kind of errors on 3.5 nodes:

2020-02-10 09:35:07,745 [myid:3] - INFO
[localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918] - Received
connection request 127.0.0.1:62591
2020-02-10 09:35:07,746 [myid:3] - ERROR
[localhost/127.0.0.1:3334:QuorumCnxManager@527] -
org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
Got unrecognized protocol version -65535

Once I upgrade all of the peers the system is up and running, without
apparently no data loss.

During the upgrade as soon as I upgrade the first node, say, server1,
server1 is not able to accept connections (error "Close of session 0x0
java.io.IOException: ZooKeeperServer not running")  from clients, this
is expected, because as far as it cannot talk with the other peers it
is practically partitioned away from the cluster.

My questions are:
1) is this expected ? I can't remember protocol changes from 3.5 to
3.6, but actually 3.6 diverged from 3.5 branch so long ago, and I was
not in the community as dev so I cannot tell
2) is this a viable option for users ? to have some temporary glitch
during the upgrade and hope that the upgrade completes without
troubles ?

In theory as long as two servers are running the same major version
(3.5 or 3.6) we have a quorum and the system is able to make progress
and to server clients.
I feel that this is quite dangerous, but I don't have enough context
to understand how this problem is possible and when we decided to
break compatibility.

The other option is that I am wrong in my test and I am messing up :-)

The other upgrade path I would like to see working like a charm is the
upgrade from 3.4 to 3.6, as I see that as soon as we release 3.6 we
should encourage users to move to 3.6 and not to 3.5.

Regards
Enrico

Reply via email to