[
https://issues.apache.org/jira/browse/ZOOKEEPER-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536467#comment-13536467
]
Flavio Junqueira commented on ZOOKEEPER-1599:
---------------------------------------------
[~shralex]
bq. Right now its always the same - 0x10000 I think we should increase it with
every major release.
In my understanding, we bump up the version only when there are changes to the
protocol. The version of the protocol is supposed to indicate changes to the
protocol and not to ZooKeeper overall.
bq. As a side note here's a way to upgrade the system with ZK-107 without any
downtime - If you have servers A, B, C running 3.3, you can connect 3 new
servers A', B', C' running 3.4 (they must have different ids), and just switch
the membership from ABC to A'B'C'. You'll be able to make the switch only once
A'B'C' is up-to-date, and the switch will not involve downtime besides a
momentary leader handoff (quicker than usual leader change). But because the
new ids must be different from the old ones I'm not sure people will use this
method.
Isn't it necessary that the servers running 3.3 also understand reconfigs?
> 3.3 server cannot join 3.4 quorum
> ---------------------------------
>
> Key: ZOOKEEPER-1599
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1599
> Project: ZooKeeper
> Issue Type: Bug
> Components: quorum
> Affects Versions: 3.3.6, 3.4.5
> Reporter: Skye Wanderman-Milne
> Assignee: Skye Wanderman-Milne
> Priority: Blocker
> Fix For: 3.4.6
>
> Attachments: ZOOKEEPER-1599.patch
>
>
> When a 3.3 server attempts to join an existing quorum lead by a 3.4 server,
> the 3.3 server is disconnected while trying to download the leader's
> snapshot. The 3.3 server restarts and starts the process over again, but is
> never able to join the quorum.
> 3.3 server log:
> {code}
> 2012-12-07 10:44:34,582 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Learner@294] - Getting a snapshot from
> leader
> 2012-12-07 10:44:34,582 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Learner@325] - Setting leader epoch 12
> 2012-12-07 10:44:54,604 - WARN
> [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Follower@82] - Exception when following the
> leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84)
> at
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> at
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:148)
> at
> org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:332)
> at
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:75)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:645)
> 2012-12-07 10:44:54,605 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Follower@165] - shutdown called
> java.lang.Exception: shutdown Follower
> at
> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:165)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:649)
> {code}
> 3.4 leader log:
> {code}
> 2012-12-07 10:51:35,178 [myid:2] - INFO
> [WorkerReceiver[myid=2]:FastLeaderElection$Messenger$WorkerReceiver@273] -
> Backward compatibility mode, server id=3
> 2012-12-07 10:51:35,178 [myid:2] - INFO
> [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 3 (n.leader),
> 0x1100000000 (n.zxid), 0x2 (n.round), LOOKING (n.state), 3 (n.sid), 0x11
> (n.peerEPoch), LEADING (my state)
> 2012-12-07 10:51:35,182 [myid:2] - INFO
> [LearnerHandler-/127.0.0.1:37654:LearnerHandler@263] - Follower sid: 3 : info
> : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@262f4873
> 2012-12-07 10:51:35,182 [myid:2] - INFO
> [LearnerHandler-/127.0.0.1:37654:LearnerHandler@318] - Synchronizing with
> Follower sid: 3 maxCommittedLog=0x0 minCommittedLog=0x0
> peerLastZxid=0x1100000000
> 2012-12-07 10:51:35,182 [myid:2] - INFO
> [LearnerHandler-/127.0.0.1:37654:LearnerHandler@395] - Sending SNAP
> 2012-12-07 10:51:35,183 [myid:2] - INFO
> [LearnerHandler-/127.0.0.1:37654:LearnerHandler@419] - Sending snapshot last
> zxid of peer is 0x1100000000 zxid of leader is 0x1200000000sent zxid of db
> as 0x1200000000
> 2012-12-07 10:51:55,204 [myid:2] - ERROR
> [LearnerHandler-/127.0.0.1:37654:LearnerHandler@562] - Unexpected exception
> causing shutdown while sock still open
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:150)
> at java.net.SocketInputStream.read(SocketInputStream.java:121)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> at
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> at
> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:450)
> 2012-12-07 10:51:55,205 [myid:2] - WARN
> [LearnerHandler-/127.0.0.1:37654:LearnerHandler@575] - ******* GOODBYE
> /127.0.0.1:37654 ********
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira