[
https://issues.apache.org/jira/browse/ZOOKEEPER-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535176#comment-13535176
]
Benjamin Reed commented on ZOOKEEPER-1599:
------------------------------------------
i think we need to be a bit careful about saying that a 3.3 server can work
with a 3.4 server without any caveats. a 3.3 server is not going to understand
new transactions, so at a minimum we would require that you have an option to
turn off new transactions in 3.4 so that it can work with 3.3. is the plan that
people would run with a mixed ensemble of 3.3 and 3.4 servers for a long time?
or is it just for upgrading?
at some point the amount of cruft and work that we have to put in for a very
short term corner case gets completely unwieldy. (we may have hit that point
already...)
being able to connect 3.4 follower to a 3.3 leader enables rolling upgrades. if
we want more, i think we should make it much more clear what the requirements
are. we should also clarify the backward compatibility requirements for servers.
> 3.3 server cannot join 3.4 quorum
> ---------------------------------
>
> Key: ZOOKEEPER-1599
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1599
> Project: ZooKeeper
> Issue Type: Bug
> Components: quorum
> Affects Versions: 3.3.6, 3.4.5
> Reporter: Skye Wanderman-Milne
> Assignee: Skye Wanderman-Milne
> Priority: Blocker
> Fix For: 3.4.6
>
> Attachments: ZOOKEEPER-1599.patch
>
>
> When a 3.3 server attempts to join an existing quorum lead by a 3.4 server,
> the 3.3 server is disconnected while trying to download the leader's
> snapshot. The 3.3 server restarts and starts the process over again, but is
> never able to join the quorum.
> 3.3 server log:
> {code}
> 2012-12-07 10:44:34,582 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Learner@294] - Getting a snapshot from
> leader
> 2012-12-07 10:44:34,582 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Learner@325] - Setting leader epoch 12
> 2012-12-07 10:44:54,604 - WARN
> [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Follower@82] - Exception when following the
> leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84)
> at
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> at
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:148)
> at
> org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:332)
> at
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:75)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:645)
> 2012-12-07 10:44:54,605 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Follower@165] - shutdown called
> java.lang.Exception: shutdown Follower
> at
> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:165)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:649)
> {code}
> 3.4 leader log:
> {code}
> 2012-12-07 10:51:35,178 [myid:2] - INFO
> [WorkerReceiver[myid=2]:FastLeaderElection$Messenger$WorkerReceiver@273] -
> Backward compatibility mode, server id=3
> 2012-12-07 10:51:35,178 [myid:2] - INFO
> [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 3 (n.leader),
> 0x1100000000 (n.zxid), 0x2 (n.round), LOOKING (n.state), 3 (n.sid), 0x11
> (n.peerEPoch), LEADING (my state)
> 2012-12-07 10:51:35,182 [myid:2] - INFO
> [LearnerHandler-/127.0.0.1:37654:LearnerHandler@263] - Follower sid: 3 : info
> : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@262f4873
> 2012-12-07 10:51:35,182 [myid:2] - INFO
> [LearnerHandler-/127.0.0.1:37654:LearnerHandler@318] - Synchronizing with
> Follower sid: 3 maxCommittedLog=0x0 minCommittedLog=0x0
> peerLastZxid=0x1100000000
> 2012-12-07 10:51:35,182 [myid:2] - INFO
> [LearnerHandler-/127.0.0.1:37654:LearnerHandler@395] - Sending SNAP
> 2012-12-07 10:51:35,183 [myid:2] - INFO
> [LearnerHandler-/127.0.0.1:37654:LearnerHandler@419] - Sending snapshot last
> zxid of peer is 0x1100000000 zxid of leader is 0x1200000000sent zxid of db
> as 0x1200000000
> 2012-12-07 10:51:55,204 [myid:2] - ERROR
> [LearnerHandler-/127.0.0.1:37654:LearnerHandler@562] - Unexpected exception
> causing shutdown while sock still open
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:150)
> at java.net.SocketInputStream.read(SocketInputStream.java:121)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> at
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> at
> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:450)
> 2012-12-07 10:51:55,205 [myid:2] - WARN
> [LearnerHandler-/127.0.0.1:37654:LearnerHandler@575] - ******* GOODBYE
> /127.0.0.1:37654 ********
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira