[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13538336#comment-13538336
 ] 

Flavio Junqueira commented on ZOOKEEPER-1599:
---------------------------------------------

bq. he shouldn't be allowed to make progress even if there's a 3.4 quorum since 
he might not understand the 3.4 commands.

If that's the case, then 3.4 is not backward compatible independent of zab. Are 
the two branches not backward compatible?

bq. I was actually only thinking of the simpler scenario where 3.3 follower 
will time out and was worried that in the future a 3.4 follower connecting to a 
3.5 leader may not time out by itself and we'll need to explicitly prevent it 
from connecting to avoid him getting commands that it doesn't understand.

I think it has been mentioned here in this jira that consecutive branches need 
to be backward compatible, so between 3.4 and 3.5 servers they should be able 
understand each other, there shouldn't be commands they don't understand.   


                
> 3.3 server cannot join 3.4 quorum
> ---------------------------------
>
>                 Key: ZOOKEEPER-1599
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1599
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.6, 3.4.5
>            Reporter: Skye Wanderman-Milne
>            Assignee: Skye Wanderman-Milne
>            Priority: Blocker
>             Fix For: 3.4.6
>
>         Attachments: ZOOKEEPER-1599.patch
>
>
> When a 3.3 server attempts to join an existing quorum lead by a 3.4 server, 
> the 3.3 server is disconnected while trying to download the leader's 
> snapshot. The 3.3 server restarts and starts the process over again, but is 
> never able to join the quorum.
> 3.3 server log:
> {code}
> 2012-12-07 10:44:34,582 - INFO  
> [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Learner@294] - Getting a snapshot from 
> leader
> 2012-12-07 10:44:34,582 - INFO  
> [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Learner@325] - Setting leader epoch 12
> 2012-12-07 10:44:54,604 - WARN  
> [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Follower@82] - Exception when following the 
> leader
> java.io.EOFException
>         at java.io.DataInputStream.readInt(DataInputStream.java:392)
>         at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84)
>         at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
>         at 
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:148)
>         at 
> org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:332)
>         at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:75)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:645)
> 2012-12-07 10:44:54,605 - INFO  
> [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Follower@165] - shutdown called
> java.lang.Exception: shutdown Follower
>         at 
> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:165)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:649)
> {code}
> 3.4 leader log:
> {code}
> 2012-12-07 10:51:35,178 [myid:2] - INFO  
> [WorkerReceiver[myid=2]:FastLeaderElection$Messenger$WorkerReceiver@273] - 
> Backward compatibility mode, server id=3
> 2012-12-07 10:51:35,178 [myid:2] - INFO  
> [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 3 (n.leader), 
> 0x1100000000 (n.zxid), 0x2 (n.round), LOOKING (n.state), 3 (n.sid), 0x11 
> (n.peerEPoch), LEADING (my state)
> 2012-12-07 10:51:35,182 [myid:2] - INFO  
> [LearnerHandler-/127.0.0.1:37654:LearnerHandler@263] - Follower sid: 3 : info 
> : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@262f4873
> 2012-12-07 10:51:35,182 [myid:2] - INFO  
> [LearnerHandler-/127.0.0.1:37654:LearnerHandler@318] - Synchronizing with 
> Follower sid: 3 maxCommittedLog=0x0 minCommittedLog=0x0 
> peerLastZxid=0x1100000000
> 2012-12-07 10:51:35,182 [myid:2] - INFO  
> [LearnerHandler-/127.0.0.1:37654:LearnerHandler@395] - Sending SNAP
> 2012-12-07 10:51:35,183 [myid:2] - INFO  
> [LearnerHandler-/127.0.0.1:37654:LearnerHandler@419] - Sending snapshot last 
> zxid of peer is 0x1100000000  zxid of leader is 0x1200000000sent zxid of db 
> as 0x1200000000
> 2012-12-07 10:51:55,204 [myid:2] - ERROR 
> [LearnerHandler-/127.0.0.1:37654:LearnerHandler@562] - Unexpected exception 
> causing shutdown while sock still open
> java.net.SocketTimeoutException: Read timed out
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.read(SocketInputStream.java:150)
>         at java.net.SocketInputStream.read(SocketInputStream.java:121)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
>         at java.io.DataInputStream.readInt(DataInputStream.java:387)
>         at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>         at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
>         at 
> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:450)
> 2012-12-07 10:51:55,205 [myid:2] - WARN  
> [LearnerHandler-/127.0.0.1:37654:LearnerHandler@575] - ******* GOODBYE 
> /127.0.0.1:37654 ********
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to