[
https://issues.apache.org/jira/browse/ZOOKEEPER-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Beom Heyn Kim updated ZOOKEEPER-2946:
-------------------------------------
Description:
The truncate() function in FileTxnLog.java may fail to properly remove the
uncommitted write. This happens when the follower that has the uncommitted
writes tries to resync with the leader after a few epochs have past. The
failure results in data inconsistency in the in-memory data tree across nodes.
Here is one procedure to reproduce the inconsistency.
Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}
To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1
# Restart the node 0, 1 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 2
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 0
b. Async setData to the node 2 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Restart the node 1 (let it finish with resync)
Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 2
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 1003
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync3 is inconsistent across nodes.
What seems to happen:
# At the step 7, setData (at zxid 0x400000001) writing value 1003 is committed
on the node 2.
{panel:title=Log from the node 2:}
...
2017-11-16 03:08:14,123 [myid:2] - DEBUG [ProcessThread(sid:2
cport:-1)::CommitProcessor@174] - Processing request::
sessionid:0x2000117327c0000 type:setData cxid:0x4 zxid:0x400000001 txntype:5
reqpath:n/a
2017-11-16 03:08:14,124 [myid:2] - DEBUG [ProcessThread(sid:2
cport:-1)::Leader@787] - Proposing:: sessionid:0x2000117327c0000 type:setData
cxid:0x4 zxid:0x400000001 txntype:5 reqpath:n/a
2017-11-16 03:08:14,124 [myid:2] - INFO [SyncThread:2:FileTxnLog@209] -
Creating new log file: log.400000001
2017-11-16 03:08:14,188 [myid:2] - DEBUG [SyncThread:2:Leader@600] - Count for
zxid: 0x400000001 is 1
2017-11-16 03:08:15,752 [myid:2] - INFO
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@512] - Shutting down
2017-11-16 03:08:15,753 [myid:2] - INFO
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced,
only synced with sids: [ 2 ]
...
{panel}
# At the step 10, the node 2 is restarted and supposed to be properly resync’ed
with the node 0 which is the leader.
a. The node 0 sends TRUNC msg so that the node 2 can truncate the setData
at zxid 0x400000001..
b. On the other hand, the node 2 tries to truncate log to get in sync with
the leader 0x200000001. However, the node 2 failed to properly truncate the
setData at zxid 0x400000001. So, even if resync was finished, the value 1003 is
still remained intact on the node 2 while other nodes have value 3 for the same
key.
c. It seems on the node 2, there is only log.100000001 and log.400000001
but no log.200000001. This seems to cause failing to delete log.400000001
during truncate(). It looks like we will be considering log.400000001 by the
time returning from the init() of FileTxnLog.java so that we will never execute
‘itr.logFile.delete()’ for the log.400000001.
d. Then, after returning from the truncate(), loadDatabase() will be invoked
and log.400000001 will be read and the setData at zxid 0x400000001 gets loaded
into the in-memory data tree.
{panel:title=Log from the node 2:}
...
2017-11-16 03:08:59,051 [myid:2] - INFO
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Follower@65] - FOLLOWING - LEADER
ELECTION TOOK - 215
2017-11-16 03:08:59,052 [myid:2] - INFO
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:QuorumPeer$QuorumServer@184] -
Resolved hostname: 127.0.0.1 to address: /127.0.0.1
2017-11-16 03:08:59,125 [myid:2] - WARN
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Learner@349] - Truncating log to get
in sync with the leader 0x200000001
2017-11-16 03:08:59,125 [myid:2] - DEBUG
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@606] -
Created new input stream
/home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.100000001
2017-11-16 03:08:59,125 [myid:2] - DEBUG
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@609] -
Created new input archive
/home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.100000001
2017-11-16 03:08:59,126 [myid:2] - DEBUG
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@647] - EOF
excepton java.io.EOFException
2017-11-16 03:08:59,126 [myid:2] - DEBUG
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@606] -
Created new input stream
/home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.400000001
2017-11-16 03:08:59,126 [myid:2] - DEBUG
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@609] -
Created new input archive
/home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.400000001
2017-11-16 03:08:59,126 [myid:2] - INFO
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileSnap@83] - Reading snapshot
/home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/snapshot.200000001
2017-11-16 03:08:59,127 [myid:2] - DEBUG
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@606] -
Created new input stream
/home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.100000001
2017-11-16 03:08:59,127 [myid:2] - DEBUG
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@609] -
Created new input archive
/home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.100000001
2017-11-16 03:08:59,128 [myid:2] - DEBUG
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@647] - EOF
excepton java.io.EOFException
2017-11-16 03:08:59,128 [myid:2] - DEBUG
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@606] -
Created new input stream
/home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.400000001
2017-11-16 03:08:59,128 [myid:2] - DEBUG
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@609] -
Created new input archive
/home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.400000001
2017-11-16 03:08:59,128 [myid:2] - DEBUG
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@647] - EOF
excepton java.io.EOFException
2017-11-16 03:08:59,131 [myid:2] - WARN
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Learner@387] - Got zxid 0x500000001
expected 0x1
2017-11-16 03:08:59,132 [myid:2] - INFO
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnSnapLog@248] - Snapshotting:
0x500000004 to
/home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/snapshot.500000004
...
{panel}
{panel:title=Log from the node 0:}
...
2017-11-16 03:08:59,050 [myid:0] - INFO
[QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:Leader@372] - LEADING - LEADER
ELECTION TOOK - 222
2017-11-16 03:08:59,055 [myid:0] - INFO
[LearnerHandler-/127.0.0.1:54482:LearnerHandler@346] - Follower sid: 2 : info :
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@266e422
2017-11-16 03:08:59,124 [myid:0] - INFO
[LearnerHandler-/127.0.0.1:54482:LearnerHandler@401] - Synchronizing with
Follower sid: 2 maxCommittedLog=0x500000004 minCommittedLog=0x100000001
peerLastZxid=0x400000001
2017-11-16 03:08:59,124 [myid:0] - DEBUG
[LearnerHandler-/127.0.0.1:54482:LearnerHandler@415] - proposal size is 14
2017-11-16 03:08:59,124 [myid:0] - DEBUG
[LearnerHandler-/127.0.0.1:54482:LearnerHandler@418] - Sending proposals to
follower
2017-11-16 03:08:59,124 [myid:0] - INFO
[LearnerHandler-/127.0.0.1:54482:LearnerHandler@475] - Sending TRUNC
2017-11-16 03:08:59,147 [myid:0] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@215] -
Accepted socket connection from /127.0.0.1:55118
2017-11-16 03:08:59,184 [myid:0] - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@383] - Exception
causing close of session 0x0: ZooKeeperServer not running
2017-11-16 03:08:59,184 [myid:0] - DEBUG
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@386] - IOException
stack trace
java.io.IOException: ZooKeeperServer not running
at
org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:977)
at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:257)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:226)
at java.lang.Thread.run(Thread.java:745)
2017-11-16 03:08:59,184 [myid:0] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@1040] - Closed socket
connection for client /127.0.0.1:55118 (no session established for client)
2017-11-16 03:08:59,224 [myid:0] - INFO
[LearnerHandler-/127.0.0.1:54482:LearnerHandler@535] - Received NEWLEADER-ACK
message from 2
2017-11-16 03:08:59,224 [myid:0] - INFO
[QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:Leader@962] - Have quorum of
supporters, sids: [ 0,2 ]; starting up and setting last processed zxid:
0x600000000
...
{panel}
was:
The truncate() function in FileTxnLog.java may fail to properly remove the
uncommitted write. It can happen when the follower that has the uncommitted
writes tries to resync with the leader after a few epochs went past. So,
inconsistency can occur in the in-memory data tree across nodes. Here is one
procedure to reproduce the inconsistency.
Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}
To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1
# Restart the node 0, 1 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 2
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 0
b. Async setData to the node 2 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Restart the node 1 (let it finish with resync)
Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 2
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 1003
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync3 is inconsistent across nodes.
What seems to happen:
# At the step 7, setData (at zxid 0x400000001) writing value 1003 is committed
on the node 2.
{panel:title=Log from the node 2:}
...
2017-11-16 03:08:14,123 [myid:2] - DEBUG [ProcessThread(sid:2
cport:-1)::CommitProcessor@174] - Processing request::
sessionid:0x2000117327c0000 type:setData cxid:0x4 zxid:0x400000001 txntype:5
reqpath:n/a
2017-11-16 03:08:14,124 [myid:2] - DEBUG [ProcessThread(sid:2
cport:-1)::Leader@787] - Proposing:: sessionid:0x2000117327c0000 type:setData
cxid:0x4 zxid:0x400000001 txntype:5 reqpath:n/a
2017-11-16 03:08:14,124 [myid:2] - INFO [SyncThread:2:FileTxnLog@209] -
Creating new log file: log.400000001
2017-11-16 03:08:14,188 [myid:2] - DEBUG [SyncThread:2:Leader@600] - Count for
zxid: 0x400000001 is 1
2017-11-16 03:08:15,752 [myid:2] - INFO
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@512] - Shutting down
2017-11-16 03:08:15,753 [myid:2] - INFO
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced,
only synced with sids: [ 2 ]
...
{panel}
# At the step 10, the node 2 is restarted and supposed to be properly resync’ed
with the node 0 which is the leader.
a. The node 0 sends TRUNC msg so that the node 2 can truncate the setData
at zxid 0x400000001..
b. On the other hand, the node 2 tries to truncate log to get in sync with
the leader 0x200000001. However, the node 2 failed to properly truncate the
setData at zxid 0x400000001. So, even if resync was finished, the value 1003 is
still remained intact on the node 2 while other nodes have value 3 for the same
key.
c. It seems on the node 2, there is only log.100000001 and log.400000001
but no log.200000001. This seems to cause failing to delete log.400000001
during truncate(). It looks like we will be considering log.400000001 by the
time returning from the init() of FileTxnLog.java so that we will never execute
‘itr.logFile.delete()’ for the log.400000001.
d. Then, after returning from the truncate(), loadDatabase() will be invoked
and log.400000001 will be read and the setData at zxid 0x400000001 gets loaded
into the in-memory data tree.
{panel:title=Log from the node 2:}
...
2017-11-16 03:08:59,051 [myid:2] - INFO
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Follower@65] - FOLLOWING - LEADER
ELECTION TOOK - 215
2017-11-16 03:08:59,052 [myid:2] - INFO
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:QuorumPeer$QuorumServer@184] -
Resolved hostname: 127.0.0.1 to address: /127.0.0.1
2017-11-16 03:08:59,125 [myid:2] - WARN
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Learner@349] - Truncating log to get
in sync with the leader 0x200000001
2017-11-16 03:08:59,125 [myid:2] - DEBUG
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@606] -
Created new input stream
/home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.100000001
2017-11-16 03:08:59,125 [myid:2] - DEBUG
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@609] -
Created new input archive
/home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.100000001
2017-11-16 03:08:59,126 [myid:2] - DEBUG
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@647] - EOF
excepton java.io.EOFException
2017-11-16 03:08:59,126 [myid:2] - DEBUG
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@606] -
Created new input stream
/home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.400000001
2017-11-16 03:08:59,126 [myid:2] - DEBUG
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@609] -
Created new input archive
/home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.400000001
2017-11-16 03:08:59,126 [myid:2] - INFO
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileSnap@83] - Reading snapshot
/home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/snapshot.200000001
2017-11-16 03:08:59,127 [myid:2] - DEBUG
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@606] -
Created new input stream
/home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.100000001
2017-11-16 03:08:59,127 [myid:2] - DEBUG
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@609] -
Created new input archive
/home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.100000001
2017-11-16 03:08:59,128 [myid:2] - DEBUG
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@647] - EOF
excepton java.io.EOFException
2017-11-16 03:08:59,128 [myid:2] - DEBUG
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@606] -
Created new input stream
/home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.400000001
2017-11-16 03:08:59,128 [myid:2] - DEBUG
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@609] -
Created new input archive
/home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.400000001
2017-11-16 03:08:59,128 [myid:2] - DEBUG
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@647] - EOF
excepton java.io.EOFException
2017-11-16 03:08:59,131 [myid:2] - WARN
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Learner@387] - Got zxid 0x500000001
expected 0x1
2017-11-16 03:08:59,132 [myid:2] - INFO
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnSnapLog@248] - Snapshotting:
0x500000004 to
/home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/snapshot.500000004
...
{panel}
{panel:title=Log from the node 0:}
...
2017-11-16 03:08:59,050 [myid:0] - INFO
[QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:Leader@372] - LEADING - LEADER
ELECTION TOOK - 222
2017-11-16 03:08:59,055 [myid:0] - INFO
[LearnerHandler-/127.0.0.1:54482:LearnerHandler@346] - Follower sid: 2 : info :
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@266e422
2017-11-16 03:08:59,124 [myid:0] - INFO
[LearnerHandler-/127.0.0.1:54482:LearnerHandler@401] - Synchronizing with
Follower sid: 2 maxCommittedLog=0x500000004 minCommittedLog=0x100000001
peerLastZxid=0x400000001
2017-11-16 03:08:59,124 [myid:0] - DEBUG
[LearnerHandler-/127.0.0.1:54482:LearnerHandler@415] - proposal size is 14
2017-11-16 03:08:59,124 [myid:0] - DEBUG
[LearnerHandler-/127.0.0.1:54482:LearnerHandler@418] - Sending proposals to
follower
2017-11-16 03:08:59,124 [myid:0] - INFO
[LearnerHandler-/127.0.0.1:54482:LearnerHandler@475] - Sending TRUNC
2017-11-16 03:08:59,147 [myid:0] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@215] -
Accepted socket connection from /127.0.0.1:55118
2017-11-16 03:08:59,184 [myid:0] - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@383] - Exception
causing close of session 0x0: ZooKeeperServer not running
2017-11-16 03:08:59,184 [myid:0] - DEBUG
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@386] - IOException
stack trace
java.io.IOException: ZooKeeperServer not running
at
org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:977)
at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:257)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:226)
at java.lang.Thread.run(Thread.java:745)
2017-11-16 03:08:59,184 [myid:0] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@1040] - Closed socket
connection for client /127.0.0.1:55118 (no session established for client)
2017-11-16 03:08:59,224 [myid:0] - INFO
[LearnerHandler-/127.0.0.1:54482:LearnerHandler@535] - Received NEWLEADER-ACK
message from 2
2017-11-16 03:08:59,224 [myid:0] - INFO
[QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:Leader@962] - Have quorum of
supporters, sids: [ 0,2 ]; starting up and setting last processed zxid:
0x600000000
...
{panel}
Summary: The truncate() function in FileTxnLog.java may fail to
properly remove an uncommitted write resulting in data inconsistency (was: The
truncate() function in FileTxnLog.java may fail to properly remove an
uncommitted write)
> The truncate() function in FileTxnLog.java may fail to properly remove an
> uncommitted write resulting in data inconsistency
> ---------------------------------------------------------------------------------------------------------------------------
>
> Key: ZOOKEEPER-2946
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2946
> Project: ZooKeeper
> Issue Type: Bug
> Affects Versions: 3.4.11, 3.4.12
> Reporter: Beom Heyn Kim
>
> The truncate() function in FileTxnLog.java may fail to properly remove the
> uncommitted write. This happens when the follower that has the uncommitted
> writes tries to resync with the leader after a few epochs have past. The
> failure results in data inconsistency in the in-memory data tree across
> nodes. Here is one procedure to reproduce the inconsistency.
> Initially:
> # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the
> leader)
> # Create 5 znodes with initial values as follow (key = value)
> {noformat}
> /testDivergenceResync0 = 0
> /testDivergenceResync1 = 1
> /testDivergenceResync2 = 2
> /testDivergenceResync3 = 3
> /testDivergenceResync4 = 4
> {noformat}
> To Reproduce:
> # Diverge the node 2
> a. Shutdown the node 0 and 1
> b. Async setData to the node 2 writing 1000 to the key
> ‘/testDivergenceResync0’
> c. Shutdown the node 2
> # Restart the node 0 and 1 (let them finish with resync)
> # Diverge the node 1
> a. Shutdown the node 0
> b. Async setData to the node 1 writing 1001 to the key
> ‘/testDivergenceResync1’
> c. Shutdown the node 1
> # Restart the node 0, 1 and 2 (let them finish with resync)
> # Diverge the node 2
> a. Shutdown the node 0 and 1
> b. Async setData to the node 2 writing 1002 to the key
> ‘/testDivergenceResync2’
> c. Shutdown the node 2
> # Restart the node 0 and 2 (let them finish with resync)
> # Diverge the node 2
> a. Shutdown the node 0
> b. Async setData to the node 2 writing 1003 to the key
> ‘/testDivergenceResync3’
> c. Shutdown the node 2
> # Restart the node 0 and 1 (let them finish with resync)
> # Diverge the node 1
> a. Shutdown the node 0
> b. Async setData to the node 1 writing 1004 to the key
> ‘/testDivergenceResync4’
> c. Shutdown the node 1
> # Restart the node 0 and 2 (let them finish with resync)
> # Restart the node 1 (let it finish with resync)
> Reading each key from each node directly will give us the output:
> {noformat}
> /testDivergenceResync0 on the node 0 = 0
> /testDivergenceResync0 on the node 1 = 0
> /testDivergenceResync0 on the node 2 = 0
> /testDivergenceResync1 on the node 0 = 1001
> /testDivergenceResync1 on the node 1 = 1001
> /testDivergenceResync1 on the node 2 = 1001
> /testDivergenceResync2 on the node 0 = 2
> /testDivergenceResync2 on the node 1 = 2
> /testDivergenceResync2 on the node 2 = 2
> /testDivergenceResync3 on the node 0 = 3
> /testDivergenceResync3 on the node 1 = 3
> /testDivergenceResync3 on the node 2 = 1003
> /testDivergenceResync4 on the node 0 = 1004
> /testDivergenceResync4 on the node 1 = 1004
> /testDivergenceResync4 on the node 2 = 1004
> {noformat}
> Thus, the value of key /testDivergenceResync3 is inconsistent across nodes.
> What seems to happen:
> # At the step 7, setData (at zxid 0x400000001) writing value 1003 is
> committed on the node 2.
> {panel:title=Log from the node 2:}
> ...
> 2017-11-16 03:08:14,123 [myid:2] - DEBUG [ProcessThread(sid:2
> cport:-1)::CommitProcessor@174] - Processing request::
> sessionid:0x2000117327c0000 type:setData cxid:0x4 zxid:0x400000001 txntype:5
> reqpath:n/a
> 2017-11-16 03:08:14,124 [myid:2] - DEBUG [ProcessThread(sid:2
> cport:-1)::Leader@787] - Proposing:: sessionid:0x2000117327c0000 type:setData
> cxid:0x4 zxid:0x400000001 txntype:5 reqpath:n/a
> 2017-11-16 03:08:14,124 [myid:2] - INFO [SyncThread:2:FileTxnLog@209] -
> Creating new log file: log.400000001
> 2017-11-16 03:08:14,188 [myid:2] - DEBUG [SyncThread:2:Leader@600] - Count
> for zxid: 0x400000001 is 1
> 2017-11-16 03:08:15,752 [myid:2] - INFO
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@512] - Shutting down
> 2017-11-16 03:08:15,753 [myid:2] - INFO
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@518] - Shutdown called
> java.lang.Exception: shutdown Leader! reason: Not sufficient followers
> synced, only synced with sids: [ 2 ]
> ...
> {panel}
> # At the step 10, the node 2 is restarted and supposed to be properly
> resync’ed with the node 0 which is the leader.
> a. The node 0 sends TRUNC msg so that the node 2 can truncate the setData
> at zxid 0x400000001..
> b. On the other hand, the node 2 tries to truncate log to get in sync
> with the leader 0x200000001. However, the node 2 failed to properly truncate
> the setData at zxid 0x400000001. So, even if resync was finished, the value
> 1003 is still remained intact on the node 2 while other nodes have value 3
> for the same key.
> c. It seems on the node 2, there is only log.100000001 and log.400000001
> but no log.200000001. This seems to cause failing to delete log.400000001
> during truncate(). It looks like we will be considering log.400000001 by the
> time returning from the init() of FileTxnLog.java so that we will never
> execute ‘itr.logFile.delete()’ for the log.400000001.
> d. Then, after returning from the truncate(), loadDatabase() will be
> invoked and log.400000001 will be read and the setData at zxid 0x400000001
> gets loaded into the in-memory data tree.
> {panel:title=Log from the node 2:}
> ...
> 2017-11-16 03:08:59,051 [myid:2] - INFO
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Follower@65] - FOLLOWING - LEADER
> ELECTION TOOK - 215
> 2017-11-16 03:08:59,052 [myid:2] - INFO
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:QuorumPeer$QuorumServer@184] -
> Resolved hostname: 127.0.0.1 to address: /127.0.0.1
> 2017-11-16 03:08:59,125 [myid:2] - WARN
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Learner@349] - Truncating log to
> get in sync with the leader 0x200000001
> 2017-11-16 03:08:59,125 [myid:2] - DEBUG
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@606] -
> Created new input stream
> /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.100000001
> 2017-11-16 03:08:59,125 [myid:2] - DEBUG
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@609] -
> Created new input archive
> /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.100000001
> 2017-11-16 03:08:59,126 [myid:2] - DEBUG
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@647] -
> EOF excepton java.io.EOFException
> 2017-11-16 03:08:59,126 [myid:2] - DEBUG
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@606] -
> Created new input stream
> /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.400000001
> 2017-11-16 03:08:59,126 [myid:2] - DEBUG
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@609] -
> Created new input archive
> /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.400000001
> 2017-11-16 03:08:59,126 [myid:2] - INFO
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileSnap@83] - Reading snapshot
> /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/snapshot.200000001
> 2017-11-16 03:08:59,127 [myid:2] - DEBUG
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@606] -
> Created new input stream
> /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.100000001
> 2017-11-16 03:08:59,127 [myid:2] - DEBUG
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@609] -
> Created new input archive
> /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.100000001
> 2017-11-16 03:08:59,128 [myid:2] - DEBUG
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@647] -
> EOF excepton java.io.EOFException
> 2017-11-16 03:08:59,128 [myid:2] - DEBUG
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@606] -
> Created new input stream
> /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.400000001
> 2017-11-16 03:08:59,128 [myid:2] - DEBUG
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@609] -
> Created new input archive
> /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.400000001
> 2017-11-16 03:08:59,128 [myid:2] - DEBUG
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@647] -
> EOF excepton java.io.EOFException
> 2017-11-16 03:08:59,131 [myid:2] - WARN
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Learner@387] - Got zxid 0x500000001
> expected 0x1
> 2017-11-16 03:08:59,132 [myid:2] - INFO
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnSnapLog@248] - Snapshotting:
> 0x500000004 to
> /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/snapshot.500000004
> ...
> {panel}
> {panel:title=Log from the node 0:}
> ...
> 2017-11-16 03:08:59,050 [myid:0] - INFO
> [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:Leader@372] - LEADING - LEADER
> ELECTION TOOK - 222
> 2017-11-16 03:08:59,055 [myid:0] - INFO
> [LearnerHandler-/127.0.0.1:54482:LearnerHandler@346] - Follower sid: 2 : info
> : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@266e422
> 2017-11-16 03:08:59,124 [myid:0] - INFO
> [LearnerHandler-/127.0.0.1:54482:LearnerHandler@401] - Synchronizing with
> Follower sid: 2 maxCommittedLog=0x500000004 minCommittedLog=0x100000001
> peerLastZxid=0x400000001
> 2017-11-16 03:08:59,124 [myid:0] - DEBUG
> [LearnerHandler-/127.0.0.1:54482:LearnerHandler@415] - proposal size is 14
> 2017-11-16 03:08:59,124 [myid:0] - DEBUG
> [LearnerHandler-/127.0.0.1:54482:LearnerHandler@418] - Sending proposals to
> follower
> 2017-11-16 03:08:59,124 [myid:0] - INFO
> [LearnerHandler-/127.0.0.1:54482:LearnerHandler@475] - Sending TRUNC
> 2017-11-16 03:08:59,147 [myid:0] - INFO
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@215] -
> Accepted socket connection from /127.0.0.1:55118
> 2017-11-16 03:08:59,184 [myid:0] - WARN
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@383] - Exception
> causing close of session 0x0: ZooKeeperServer not running
> 2017-11-16 03:08:59,184 [myid:0] - DEBUG
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@386] - IOException
> stack trace
> java.io.IOException: ZooKeeperServer not running
> at
> org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:977)
> at
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:257)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:226)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-16 03:08:59,184 [myid:0] - INFO
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@1040] - Closed
> socket connection for client /127.0.0.1:55118 (no session established for
> client)
> 2017-11-16 03:08:59,224 [myid:0] - INFO
> [LearnerHandler-/127.0.0.1:54482:LearnerHandler@535] - Received NEWLEADER-ACK
> message from 2
> 2017-11-16 03:08:59,224 [myid:0] - INFO
> [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:Leader@962] - Have quorum of
> supporters, sids: [ 0,2 ]; starting up and setting last processed zxid:
> 0x600000000
> ...
> {panel}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)