[
https://issues.apache.org/jira/browse/ZOOKEEPER-4681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
krystal he updated ZOOKEEPER-4681:
----------------------------------
Description:
Using a [tool|https://github.com/kry4tall/CC-ZOO358] that I modifyed from
[Filip Niksic's zootester|https://github.com/fniksic/zootester] for testing
ZooKeeper, I discovered the following scenario which causes uncommitted
requests to be executed.
Zab protocol has three rounds: PROPOSE, ACK, and COMMIT. By adding relevant
code to the zookeeper source code,my tool can drop the PROPOSAL, ACK and COMMIT
messages and collect the values of some variables of each server instance at
the end of each round. Except affecting message reception, my code will not
affect other actions of Zookeeper.
Setup:
ubuntu 22.04.2, maven 3.9.0, ant 1.10.13.
Replace directory called "zookeeper-server" in Zookeeper 3.5.8 with the
"zookeeper-server" in [my github repo|https://github.com/kry4tall/CC-ZOO358].
Ant the modified Zookeeper 3.5.8 to get zookeeper-3.5.8.jar. Replace
zookeeper-3.5.8.jar downloaded by maven.
Create a directory called "states" and a file called
"[scenarios|https://github.com/kry4tall/CC-ZOO358/blob/krystal/zoo-tester/test/scenarios]".
Write the path to test.properties in zoo-tester's resource directory.
Use "-s scenario-X"(X = 1,2,3,4,5,6) as the startup parameter to run the main
method of ZooTester.
Base scenario:
Initially, start an ensemble with 3 servers called A, B, and C, and initialize
2 znodes called /key0 and /key1, and set them to 0 and 1 respectively.
# Request to set /key0 to 1000 on 3 servers.
# *(Optional) Isolate the propose messages which leader send to 2 followers.*
# *(Optional) Isolate the ack messages which 2 followers send to leader.*
# (Optional) Stop all servers and then restart them.
# (Optional) Read /key0 and /key1 in all servers respectively.
# Request to set /key1 to 1001 on 3 servers.
# (Optional) Stop all servers and then restart them.
# Read /key0 and /key1 in all servers respectively.
Mark the execution step list [1,2,5,6,8] as {*}scenario1{*}, [1,2,4,5,6,8] as
{*}scenario2{*}, [1,2,5,6,7,8] as {*}scenario3{*}, [1,2,4,5,6,7,8] as
{*}scenario4{*}, [1,2,6,8] as *scenario5* and [1,2,6,7,8] as {*}scenario6{*},
[1,3,5,6,8] as {*}scenario7{*}, [1,3,4,5,6,8] as {*}scenario8{*}, [1,3,5,6,7,8]
as {*}scenario9{*}, [1,3,4,5,6,7,8] as {*}scenario10{*}, [1,3,6,8] as
*scenario11* and [1,3,6,7,8] as {*}scenario12{*}.
The output of these 12 scenarios is placed in the attachment.
Surprisingly, key0 is set to 1000 and key1 is set to 1001 in some scenarios and
restarting all servers affects the value of znodes. For comparison, I also put
the output without losing the ack message in the attachment. Everything is fine
when no ack is dropped.
However, value 1000 and 1001 should not appear in any znode, because the
proposal of the first request cannot obtain enough ack, it cannot be committed.
From the values of the variables of the server instance I collected, we can
confirm that the proposal of the first request was uncommitted. Based on this,
when no node restart occurs, servers will also not commit the proposal of the
second request if there is a pending proposal that has not been committed
before, according to the source code of Zookeeper.
was:
Using a [tool|https://github.com/kry4tall/CC-ZOO358] that I modifyed from
[Filip Niksic's zootester|https://github.com/fniksic/zootester] for testing
ZooKeeper, I discovered the following scenario which causes uncommitted
requests to be executed.
Zab protocol has three rounds: PROPOSE, ACK, and COMMIT. By adding relevant
code to the zookeeper source code,my tool can drop the PROPOSAL, ACK and COMMIT
messages and collect the values of some variables of each server instance at
the end of each round. Except affecting message reception, my code will not
affect other actions of Zookeeper.
Setup:
ubuntu 22.04.2, maven 3.9.0, ant 1.10.13.
Replace directory called "zookeeper-server" in Zookeeper 3.5.8 with the
"zookeeper-server" in [my github repo|https://github.com/kry4tall/CC-ZOO358].
Ant the modified Zookeeper 3.5.8 to get zookeeper-3.5.8.jar. Replace
zookeeper-3.5.8.jar downloaded by maven.
Create a directory called "states" and a file called
"[scenarios|https://github.com/kry4tall/CC-ZOO358/blob/krystal/zoo-tester/test/scenarios]".
Write the path to test.properties in zoo-tester's resource directory.
Use "-s scenario-X"(X = 1,2,3,4,5,6) as the startup parameter to run the main
method of ZooTester.
Base scenario:
Initially, start an ensemble with 3 servers called A, B, and C, and initialize
2 znodes called /key0 and /key1, and set them to 0 and 1 respectively.
# Request to set /key0 to 1000 on 3 servers.
# *(Optional) Isolate the propose messages which leader send to 2 followers.*
# *(Optional) Isolate the ack messages which 2 followers send to leader.*
# (Optional) Stop all servers and then restart them.
# (Optional) Read /key0 and /key1 in all servers respectively.
# Request to set /key1 to 1001 on 3 servers.
# (Optional) Stop all servers and then restart them.
# Read /key0 and /key1 in all servers respectively.
Mark the execution step list [1,2,5,6,8] as {*}scenario1{*}, [1,2,4,5,6,8] as
{*}scenario2{*}, [1,2,5,6,7,8] as {*}scenario3{*}, [1,2,4,5,6,7,8] as
{*}scenario4{*}, [1,2,6,8] as *scenario5* and [1,2,6,7,8] as {*}scenario6{*},
[1,3,5,6,8] as {*}scenario7{*}, [1,3,4,5,6,8] as {*}scenario8{*}, [1,3,5,6,7,8]
as {*}scenario9{*}, [1,3,4,5,6,7,8] as {*}scenario10{*}, [1,3,6,8] as
*scenario11* and [1,3,6,7,8] as {*}scenario12{*}.
The output of these 12 scenarios is placed in the attachment. Surprisingly,
key0 is set to 1000 and key1 is set to 1001 in some scenarios and restarting
all servers affects the value of znodes. For comparison, I also put the output
without losing the ack message in the attachment. Everything is fine when no
ack is dropped.
However, value 1000 and 1001 should not appear in any znode, because the
proposal of the first request cannot obtain enough ack, it cannot be committed.
From the values of the variables of the server instance I collected, we can
confirm that the proposal of the first request was uncommitted. Based on this,
when no node restart occurs, servers will also not commit the proposal of the
second request if there is a pending proposal that has not been committed
before, according to the source code of Zookeeper.
> Uncommitted requests have been executed
> ----------------------------------------
>
> Key: ZOOKEEPER-4681
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4681
> Project: ZooKeeper
> Issue Type: Bug
> Components: quorum
> Affects Versions: 3.5.8
> Reporter: krystal he
> Priority: Critical
> Attachments: zookeeper-no-drop.patch, zookeeper-scenario1.patch,
> zookeeper-scenario10.patch, zookeeper-scenario11.patch,
> zookeeper-scenario12.patch, zookeeper-scenario2.patch,
> zookeeper-scenario3.patch, zookeeper-scenario4.patch,
> zookeeper-scenario5.patch, zookeeper-scenario6.patch,
> zookeeper-scenario7.patch, zookeeper-scenario8.patch,
> zookeeper-scenario9.patch
>
>
> Using a [tool|https://github.com/kry4tall/CC-ZOO358] that I modifyed from
> [Filip Niksic's zootester|https://github.com/fniksic/zootester] for testing
> ZooKeeper, I discovered the following scenario which causes uncommitted
> requests to be executed.
> Zab protocol has three rounds: PROPOSE, ACK, and COMMIT. By adding relevant
> code to the zookeeper source code,my tool can drop the PROPOSAL, ACK and
> COMMIT messages and collect the values of some variables of each server
> instance at the end of each round. Except affecting message reception, my
> code will not affect other actions of Zookeeper.
>
> Setup:
> ubuntu 22.04.2, maven 3.9.0, ant 1.10.13.
> Replace directory called "zookeeper-server" in Zookeeper 3.5.8 with the
> "zookeeper-server" in [my github repo|https://github.com/kry4tall/CC-ZOO358].
> Ant the modified Zookeeper 3.5.8 to get zookeeper-3.5.8.jar. Replace
> zookeeper-3.5.8.jar downloaded by maven.
> Create a directory called "states" and a file called
> "[scenarios|https://github.com/kry4tall/CC-ZOO358/blob/krystal/zoo-tester/test/scenarios]".
> Write the path to test.properties in zoo-tester's resource directory.
> Use "-s scenario-X"(X = 1,2,3,4,5,6) as the startup parameter to run the main
> method of ZooTester.
>
> Base scenario:
> Initially, start an ensemble with 3 servers called A, B, and C, and
> initialize 2 znodes called /key0 and /key1, and set them to 0 and 1
> respectively.
> # Request to set /key0 to 1000 on 3 servers.
> # *(Optional) Isolate the propose messages which leader send to 2 followers.*
> # *(Optional) Isolate the ack messages which 2 followers send to leader.*
> # (Optional) Stop all servers and then restart them.
> # (Optional) Read /key0 and /key1 in all servers respectively.
> # Request to set /key1 to 1001 on 3 servers.
> # (Optional) Stop all servers and then restart them.
> # Read /key0 and /key1 in all servers respectively.
> Mark the execution step list [1,2,5,6,8] as {*}scenario1{*}, [1,2,4,5,6,8] as
> {*}scenario2{*}, [1,2,5,6,7,8] as {*}scenario3{*}, [1,2,4,5,6,7,8] as
> {*}scenario4{*}, [1,2,6,8] as *scenario5* and [1,2,6,7,8] as {*}scenario6{*},
> [1,3,5,6,8] as {*}scenario7{*}, [1,3,4,5,6,8] as {*}scenario8{*},
> [1,3,5,6,7,8] as {*}scenario9{*}, [1,3,4,5,6,7,8] as {*}scenario10{*},
> [1,3,6,8] as *scenario11* and [1,3,6,7,8] as {*}scenario12{*}.
> The output of these 12 scenarios is placed in the attachment.
> Surprisingly, key0 is set to 1000 and key1 is set to 1001 in some scenarios
> and restarting all servers affects the value of znodes. For comparison, I
> also put the output without losing the ack message in the attachment.
> Everything is fine when no ack is dropped.
> However, value 1000 and 1001 should not appear in any znode, because the
> proposal of the first request cannot obtain enough ack, it cannot be
> committed. From the values of the variables of the server instance I
> collected, we can confirm that the proposal of the first request was
> uncommitted. Based on this, when no node restart occurs, servers will also
> not commit the proposal of the second request if there is a pending proposal
> that has not been committed before, according to the source code of Zookeeper.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)