[
https://issues.apache.org/jira/browse/FLINK-28078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17558345#comment-17558345
]
Matthias Pohl commented on FLINK-28078:
---------------------------------------
I did a comparison of a successful run and the failed run. I'm getting the
feeling that we're running into some race condition on the {{LeaderLatch}}
implementation side.
Here are the logs of a successful run starting from the log message where the
latch zNode is checked for existence after being created:
{code:java}
1376 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:exists cxid:0x1b
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1376 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:exists cxid:0x1b zxid:0xfffffffffffffffe
txntype:unknown reqpath:/flink/default/latch
1377 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:createContainer
cxid:0x1c zxid:0xa txntype:19 reqpath:n/a
1378 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:createContainer cxid:0x1c zxid:0xa
txntype:19 reqpath:n/a
1380 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x1d
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
1380 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x1d
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
1380 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x1e
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
1380 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x1e
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
1380 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x1f
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
1380 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x1f
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
1381 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:getChildren cxid:0x20
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1381 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:getChildren cxid:0x20
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1385 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:create2 cxid:0x21
zxid:0xb txntype:15 reqpath:n/a
1385 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:create2 cxid:0x21 zxid:0xb txntype:15
reqpath:n/a
1385 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:exists cxid:0x22
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink
1385 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:exists cxid:0x22 zxid:0xfffffffffffffffe
txntype:unknown reqpath:/flink
1386 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:exists cxid:0x23
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
1386 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:exists cxid:0x23 zxid:0xfffffffffffffffe
txntype:unknown reqpath:/flink/default
1387 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x24
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1387 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x24
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1387 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:exists cxid:0x25
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1387 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:exists cxid:0x25 zxid:0xfffffffffffffffe
txntype:unknown reqpath:/flink/default/latch
1388 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:getChildren cxid:0x26
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1388 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:getChildren cxid:0x26
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1390 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:create2 cxid:0x27
zxid:0xc txntype:15 reqpath:n/a
1391 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:create2 cxid:0x27 zxid:0xc txntype:15
reqpath:n/a
1391 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:exists cxid:0x28
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink
1391 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:exists cxid:0x28 zxid:0xfffffffffffffffe
txntype:unknown reqpath:/flink
1391 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:exists cxid:0x29
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
1391 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:exists cxid:0x29 zxid:0xfffffffffffffffe
txntype:unknown reqpath:/flink/default
1392 [ForkJoinPool-1-worker-9-EventThread] DEBUG
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
[] - ZooKeeperMultipleComponentLeaderElectionDriver obtained the leadership.
1392 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x2a
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1392 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x2a
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1393 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:exists cxid:0x2b
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1393 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:exists cxid:0x2b zxid:0xfffffffffffffffe
txntype:unknown reqpath:/flink/default/latch
1393 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:getChildren cxid:0x2c
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1393 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:getChildren cxid:0x2c
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1394 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:create2 cxid:0x2d
zxid:0xd txntype:15 reqpath:n/a
1395 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:create2 cxid:0x2d zxid:0xd txntype:15
reqpath:n/a
1421 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:getData cxid:0x2e
zxid:0xfffffffffffffffe txntype:unknown
reqpath:/flink/default/latch/_c_2b57761c-31e0-458d-9466-b409010d3d14-latch-0000000000
1421 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:getData cxid:0x2e zxid:0xfffffffffffffffe
txntype:unknown
reqpath:/flink/default/latch/_c_2b57761c-31e0-458d-9466-b409010d3d14-latch-0000000000
1421 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x2f
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1421 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x2f
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1422 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:getData cxid:0x30
zxid:0xfffffffffffffffe txntype:unknown
reqpath:/flink/default/latch/_c_1e836e28-a9d1-4bea-97f3-222034044247-latch-0000000001
1422 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:getData cxid:0x30 zxid:0xfffffffffffffffe
txntype:unknown
reqpath:/flink/default/latch/_c_1e836e28-a9d1-4bea-97f3-222034044247-latch-0000000001
1497 [ForkJoinPool-1-worker-9] INFO
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
[] - Closing ZooKeeperMultipleComponentLeaderElectionDriver.
1501 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:checkWatches cxid:0x31
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
1501 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:checkWatches cxid:0x31
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
1505 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:delete cxid:0x32
zxid:0xe txntype:2 reqpath:n/a
1506 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:delete cxid:0x32 zxid:0xe txntype:2
reqpath:n/a
1506 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x33
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1506 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x33
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1507 [ForkJoinPool-1-worker-9-EventThread] DEBUG
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
[] - ZooKeeperMultipleComponentLeaderElectionDriver obtained the leadership.
1507 [ForkJoinPool-1-worker-9] INFO
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
[] - Closing ZooKeeperMultipleComponentLeaderElectionDriver.
1508 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:checkWatches cxid:0x34
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
1508 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:checkWatches cxid:0x34
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
1508 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:delete cxid:0x35
zxid:0xf txntype:2 reqpath:n/a
1508 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:delete cxid:0x35 zxid:0xf txntype:2
reqpath:n/a
1509 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x36
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1509 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:getChildren2 cxid:0x36
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
1509 [ForkJoinPool-1-worker-9-EventThread] DEBUG
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
[] - ZooKeeperMultipleComponentLeaderElectionDriver obtained the leadership.
1510 [ForkJoinPool-1-worker-9] INFO
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
[] - Closing ZooKeeperMultipleComponentLeaderElectionDriver.
1510 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:checkWatches cxid:0x37
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
1510 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:checkWatches cxid:0x37
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
1510 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:delete cxid:0x38
zxid:0x10 txntype:2 reqpath:n/a
1511 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:delete cxid:0x38 zxid:0x10 txntype:2
reqpath:n/a
1511 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:checkWatches cxid:0x39
zxid:0xfffffffffffffffe txntype:unknown reqpath:/zookeeper/config
1511 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:checkWatches cxid:0x39
zxid:0xfffffffffffffffe txntype:unknown reqpath:/zookeeper/config
1513 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- Processing request:: sessionid:0x100009ef1b30000 type:closeSession cxid:0x3a
zxid:0x11 txntype:-11 reqpath:n/a
1513 [SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor []
- sessionid:0x100009ef1b30000 type:closeSession cxid:0x3a zxid:0x11 txntype:-11
reqpath:n/a
{code}
and here are the merged (i.e. I integrated the test-sides logs into the
zookeeper-server logs based on the time) logs:
{code:java}
16:17:07,824 [Curator-ConnectionStateManager-0] DEBUG
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
[] - Connected to ZooKeeper quorum. Leader election can start.
16:17:07,824 [Curator-ConnectionStateManager-0] DEBUG
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
[] - Connected to ZooKeeper quorum. Leader election can start.
16:17:07,829 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] - Processing
request:: sessionid:0x100cf6d9cf60000 type:exists cxid:0x10
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
16:17:07,829 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] -
sessionid:0x100cf6d9cf60000 type:exists cxid:0x10 zxid:0xfffffffffffffffe
txntype:unknown reqpath:/flink/default/latch
16:17:07,831 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] - Processing
request:: sessionid:0x100cf6d9cf60000 type:getData cxid:0x11
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
16:17:07,832 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] -
sessionid:0x100cf6d9cf60000 type:getData cxid:0x11 zxid:0xfffffffffffffffe
txntype:unknown reqpath:/flink/default
16:17:07,837 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] - Processing
request:: sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x12
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
16:17:07,837 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] -
sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x12 zxid:0xfffffffffffffffe
txntype:unknown reqpath:/flink/default
16:17:07,838 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] - Processing
request:: sessionid:0x100cf6d9cf60000 type:createContainer cxid:0x13 zxid:0x7
txntype:19 reqpath:n/a
16:17:07,838 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] -
sessionid:0x100cf6d9cf60000 type:createContainer cxid:0x13 zxid:0x7 txntype:19
reqpath:n/a
16:17:07,839 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] - Processing
request:: sessionid:0x100cf6d9cf60000 type:create2 cxid:0x14 zxid:0x8
txntype:15 reqpath:n/a
16:17:07,839 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] -
sessionid:0x100cf6d9cf60000 type:create2 cxid:0x14 zxid:0x8 txntype:15
reqpath:n/a
16:17:07,839 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] - Processing
request:: sessionid:0x100cf6d9cf60000 type:getData cxid:0x15
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
16:17:07,839 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] -
sessionid:0x100cf6d9cf60000 type:getData cxid:0x15 zxid:0xfffffffffffffffe
txntype:unknown reqpath:/flink/default
16:17:07,839 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] - Processing
request:: sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x16
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
16:17:07,839 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] -
sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x16 zxid:0xfffffffffffffffe
txntype:unknown reqpath:/flink/default
16:17:07,846 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] - Processing
request:: sessionid:0x100cf6d9cf60000 type:create2 cxid:0x17 zxid:0x9
txntype:15 reqpath:n/a
16:17:07,846 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] -
sessionid:0x100cf6d9cf60000 type:create2 cxid:0x17 zxid:0x9 txntype:15
reqpath:n/a
16:17:07,846 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] - Processing
request:: sessionid:0x100cf6d9cf60000 type:getChildren cxid:0x18
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
16:17:07,846 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] -
sessionid:0x100cf6d9cf60000 type:getChildren cxid:0x18 zxid:0xfffffffffffffffe
txntype:unknown reqpath:/flink/default/latch
16:17:07,846 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] - Processing
request:: sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x19
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
16:17:07,846 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] -
sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x19 zxid:0xfffffffffffffffe
txntype:unknown reqpath:/flink/default
16:17:07,846 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] - Processing
request:: sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x1a
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
16:17:07,846 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] -
sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x1a zxid:0xfffffffffffffffe
txntype:unknown reqpath:/flink/default
16:17:07,846 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] - Processing
request:: sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x1b
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
16:17:07,846 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] -
sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x1b zxid:0xfffffffffffffffe
txntype:unknown reqpath:/flink/default/latch
16:17:07,848 [ForkJoinPool-45-worker-25-EventThread] DEBUG
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
[] - ZooKeeperMultipleComponentLeaderElectionDriver obtained the leadership.
16:17:07,854 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] - Processing
request:: sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x1c
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
16:17:07,854 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] -
sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x1c zxid:0xfffffffffffffffe
txntype:unknown reqpath:/flink/default/latch
16:17:07,848 [ForkJoinPool-45-worker-25-EventThread] DEBUG
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
[] - ZooKeeperMultipleComponentLeaderElectionDriver obtained the leadership.
16:17:07,860 [ForkJoinPool-45-worker-25] INFO
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
[] - Closing ZooKeeperMultipleComponentLeaderElectionDriver.
16:17:07,860 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] - Processing
request:: sessionid:0x100cf6d9cf60000 type:create2 cxid:0x1d zxid:0xa
txntype:15 reqpath:n/a
16:17:07,860 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] -
sessionid:0x100cf6d9cf60000 type:create2 cxid:0x1d zxid:0xa txntype:15
reqpath:n/a
16:17:07,862 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] - Processing
request:: sessionid:0x100cf6d9cf60000 type:checkWatches cxid:0x1e
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
16:17:07,862 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] -
sessionid:0x100cf6d9cf60000 type:checkWatches cxid:0x1e zxid:0xfffffffffffffffe
txntype:unknown reqpath:/flink/default
16:17:07,863 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] - Processing
request:: sessionid:0x100cf6d9cf60000 type:delete cxid:0x1f zxid:0xb txntype:2
reqpath:n/a
16:17:07,863 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] -
sessionid:0x100cf6d9cf60000 type:delete cxid:0x1f zxid:0xb txntype:2 reqpath:n/a
16:17:07,863 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] - Processing
request:: sessionid:0x100cf6d9cf60000 type:getData cxid:0x20
zxid:0xfffffffffffffffe txntype:unknown
reqpath:/flink/default/latch/_c_ddb7327c-1775-4084-9fbd-40e263758697-latch-0000000000
16:17:07,863 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] -
sessionid:0x100cf6d9cf60000 type:getData cxid:0x20 zxid:0xfffffffffffffffe
txntype:unknown
reqpath:/flink/default/latch/_c_ddb7327c-1775-4084-9fbd-40e263758697-latch-0000000000
16:17:07,864 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] - Processing
request:: sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x21
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
16:17:07,864 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] -
sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x21 zxid:0xfffffffffffffffe
txntype:unknown reqpath:/flink/default/latch
16:17:07,866 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] - Processing
request:: sessionid:0x100cf6d9cf60000 type:delete cxid:0x22 zxid:0xc txntype:2
reqpath:n/a
16:17:07,866 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] -
sessionid:0x100cf6d9cf60000 type:delete cxid:0x22 zxid:0xc txntype:2 reqpath:n/a
16:17:07,869 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] - Processing
request:: sessionid:0x100cf6d9cf60000 type:create2 cxid:0x23 zxid:0xd
txntype:15 reqpath:n/a
16:17:07,869 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] -
sessionid:0x100cf6d9cf60000 type:create2 cxid:0x23 zxid:0xd txntype:15
reqpath:n/a
16:17:07,869 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] - Processing
request:: sessionid:0x100cf6d9cf60000 type:getData cxid:0x24
zxid:0xfffffffffffffffe txntype:unknown
reqpath:/flink/default/latch/_c_6eb174e9-bb77-4a73-9604-531242c11c0e-latch-0000000001
16:17:07,869 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] -
sessionid:0x100cf6d9cf60000 type:getData cxid:0x24 zxid:0xfffffffffffffffe
txntype:unknown
reqpath:/flink/default/latch/_c_6eb174e9-bb77-4a73-9604-531242c11c0e-latch-0000000001
16:17:07,871 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] - Processing
request:: sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x25
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
16:17:07,871 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] -
sessionid:0x100cf6d9cf60000 type:getChildren2 cxid:0x25 zxid:0xfffffffffffffffe
txntype:unknown reqpath:/flink/default/latch
16:17:07,875 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] - Processing
request:: sessionid:0x100cf6d9cf60000 type:delete cxid:0x26 zxid:0xe txntype:2
reqpath:n/a
16:17:07,875 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] -
sessionid:0x100cf6d9cf60000 type:delete cxid:0x26 zxid:0xe txntype:2 reqpath:n/a
16:17:07,876 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] - Processing
request:: sessionid:0x100cf6d9cf60000 type:create2 cxid:0x27 zxid:0xf
txntype:15 reqpath:n/a
16:17:07,876 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] -
sessionid:0x100cf6d9cf60000 type:create2 cxid:0x27 zxid:0xf txntype:15
reqpath:n/a
16:17:07,876 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] - Processing
request:: sessionid:0x100cf6d9cf60000 type:getData cxid:0x28
zxid:0xfffffffffffffffe txntype:unknown
reqpath:/flink/default/latch/_c_8008ceb9-e0b9-459b-8fa5-44428fa31e29-latch-0000000002
16:17:07,876 [ SyncThread:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor [] -
sessionid:0x100cf6d9cf60000 type:getData cxid:0x28 zxid:0xfffffffffffffffe
txntype:unknown
reqpath:/flink/default/latch/_c_8008ceb9-e0b9-459b-8fa5-44428fa31e29-latch-0000000002
[...]
{code}
The LeaderLatch implementation works like that (based on the code):
# {{LeaderLatch}} is started pointing to a given zNode path
({{{}/flink/default/latch/{}}} in our case).
# The instance will get {{reset()}} initially setting the leadership for this
latch to {{false}} and triggering a child creation
# Child creation is done "withProtection" (i.e. a random String will be
prefixed to the child) in mode {{EPHEMERAL_SEQUENTIAL}} (i.e. the child will be
deleted on connection-loss and a monotonically increasing number is added as a
suffix) with a callback that triggers {{getChildren}} in case of success and
selecting the leader based on the suffix of the child nodes names (lower ID
wins). Hence, an order between the children is established based on the ID.
# {{getChildren}} callback triggers leadership detection in the local
{{LeaderLatch}} instance.
## For the child with the lowest ID, the leadership is triggered in the
callback
## If the current {{LeaderLatch}} doesn't correspond to the child with the
lowest ID, a watcher is set up watching the direct predecessor (based on the
ID-based ordering I mentioned before) for deletion. If the deletion happens,
the watcher is triggered (only once, i.e. not repeatedly) and reinitiates the
leadership detection through {{getChildren}} (4.) if the {{LeaderLatch}} isn't
closed, yet. This is what we observe in the successful test run with the
watcher being triggered.
There is another code path if the predecessor node doesn't exist in
{{getChildren}} (4.). In that case, an entire {{reset()}} (2.) is triggered
resulting in the deletion of the current's {{LeaderLatch}}'s zNode and the
recreation of the child zNode. It appears that we're ending up in a this code
path for the unsuccessful test run.
> ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers
> runs into timeout
> ----------------------------------------------------------------------------------------------------------
>
> Key: FLINK-28078
> URL: https://issues.apache.org/jira/browse/FLINK-28078
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.16.0
> Reporter: Matthias Pohl
> Assignee: Matthias Pohl
> Priority: Major
> Labels: test-stability
>
> [Build
> #36189|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=36189&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=10455]
> got stuck in
> {{ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers}}
> {code}
> "ForkJoinPool-45-worker-25" #525 daemon prio=5 os_prio=0
> tid=0x00007fc74d9e3800 nid=0x62c8 waiting on condition [0x00007fc6ff2f2000]
> May 30 16:36:10 java.lang.Thread.State: WAITING (parking)
> May 30 16:36:10 at sun.misc.Unsafe.park(Native Method)
> May 30 16:36:10 - parking to wait for <0x00000000c2571b80> (a
> java.util.concurrent.CompletableFuture$Signaller)
> May 30 16:36:10 at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> May 30 16:36:10 at
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
> May 30 16:36:10 at
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3313)
> May 30 16:36:10 at
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
> May 30 16:36:10 at
> java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947)
> May 30 16:36:10 at
> org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers(ZooKeeperMultipleComponentLeaderElectionDriverTest.java:256)
> May 30 16:36:10 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> May 30 16:36:10 at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> May 30 16:36:10 at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> May 30 16:36:10 at java.lang.reflect.Method.invoke(Method.java:498)
> [...]
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)