[ 
https://issues.apache.org/jira/browse/FLINK-30484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17651651#comment-17651651
 ] 

Matthias Pohl edited comment on FLINK-30484 at 12/23/22 1:01 PM:
-----------------------------------------------------------------

This issue has the same cause as FLINK-28078: CURATOR-645 causes it because 
we're reusing the LeaderLatch client. We could add a sleep (similar to how it 
got fixed in FLINK-28078), or stop reusing the LeaderLatch client. 
Alternatively, we could just upgrade to curator 5.4.0 where CURATOR-645 is 
fixed (FLINK-29173).

Extract from the {{zookeeper-server-1.log}} of the failed build:
{code}
[...]
04:44:47,525 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x1000026f4a00000 type:getData cxid:0xe0 
zxid:0xfffffffffffffffe txntype:unknown 
reqpath:/flink/default/latch/_c_73c184d1-e0c3-4884-9476-e55cf87c5963-latch-0000000007
04:44:47,525 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x1000026f4a00000 type:getData cxid:0xe0 zxid:0xfffffffffffffffe 
txntype:unknown 
reqpath:/flink/default/latch/_c_73c184d1-e0c3-4884-9476-e55cf87c5963-latch-0000000007
04:44:47,525 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x1000026f4a00000 type:getChildren2 cxid:0xe1 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
04:44:47,525 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x1000026f4a00000 type:getChildren2 cxid:0xe1 zxid:0xfffffffffffffffe 
txntype:unknown reqpath:/flink/default/latch
04:44:47,526 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x1000026f4a00000 type:delete cxid:0xe2 zxid:0x37 txntype:2 
reqpath:n/a
04:44:47,526 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x1000026f4a00000 type:delete cxid:0xe2 zxid:0x37 txntype:2 
reqpath:n/a
04:44:47,526 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x1000026f4a00000 type:create2 cxid:0xe3 zxid:0x38 
txntype:15 reqpath:n/a
04:44:47,526 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x1000026f4a00000 type:create2 cxid:0xe3 zxid:0x38 txntype:15 
reqpath:n/a
04:44:47,527 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x1000026f4a00000 type:getData cxid:0xe4 
zxid:0xfffffffffffffffe txntype:unknown 
reqpath:/flink/default/latch/_c_175e24a6-59f8-486b-a6c6-3630c0cd00bd-latch-0000000008
04:44:47,527 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x1000026f4a00000 type:getData cxid:0xe4 zxid:0xfffffffffffffffe 
txntype:unknown 
reqpath:/flink/default/latch/_c_175e24a6-59f8-486b-a6c6-3630c0cd00bd-latch-0000000008
04:44:47,527 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x1000026f4a00000 type:getChildren2 cxid:0xe5 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
04:44:47,527 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x1000026f4a00000 type:getChildren2 cxid:0xe5 zxid:0xfffffffffffffffe 
txntype:unknown reqpath:/flink/default/latch
04:44:47,528 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x1000026f4a00000 type:delete cxid:0xe6 zxid:0x39 txntype:2 
reqpath:n/a
04:44:47,528 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x1000026f4a00000 type:delete cxid:0xe6 zxid:0x39 txntype:2 
reqpath:n/a
04:44:47,528 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x1000026f4a00000 type:create2 cxid:0xe7 zxid:0x3a 
txntype:15 reqpath:n/a
04:44:47,528 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x1000026f4a00000 type:create2 cxid:0xe7 zxid:0x3a txntype:15 
reqpath:n/a
04:44:47,528 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x1000026f4a00000 type:getData cxid:0xe8 
zxid:0xfffffffffffffffe txntype:unknown 
reqpath:/flink/default/latch/_c_707dcb78-1f57-497b-8859-c18258fb2e7e-latch-0000000009
04:44:47,529 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x1000026f4a00000 type:getData cxid:0xe8 zxid:0xfffffffffffffffe 
txntype:unknown 
reqpath:/flink/default/latch/_c_707dcb78-1f57-497b-8859-c18258fb2e7e-latch-0000000009
04:44:47,529 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x1000026f4a00000 type:getChildren2 cxid:0xe9 
zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/latch
04:44:47,529 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x1000026f4a00000 type:getChildren2 cxid:0xe9 zxid:0xfffffffffffffffe 
txntype:unknown reqpath:/flink/default/latch
04:44:47,530 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x1000026f4a00000 type:delete cxid:0xea zxid:0x3b txntype:2 
reqpath:n/a
04:44:47,530 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x1000026f4a00000 type:delete cxid:0xea zxid:0x3b txntype:2 
reqpath:n/a
04:44:47,531 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x1000026f4a00000 type:create2 cxid:0xeb zxid:0x3c 
txntype:15 reqpath:n/a
04:44:47,531 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - 
sessionid:0x1000026f4a00000 type:create2 cxid:0xeb zxid:0x3c txntype:15 
reqpath:n/a
04:44:47,531 [        SyncThread:0] DEBUG 
org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing 
request:: sessionid:0x1000026f4a00000 type:getData cxid:0xec 
zxid:0xfffffffffffffffe txntype:unknown 
reqpath:/flink/default/latch/_c_87c2ef30-fe84-410f-a2d0-39a634e3a463-latch-0000000010
[...]
{code}


was (Author: mapohl):
This issue has the same cause as FLINK-28078: CURATOR-645 causes it because 
we're reusing the LeaderLatch client. We could add a sleep (similar to how it 
got fixed in FLINK-28078), or stop reusing the LeaderLatch client. 
Alternatively, we could just upgrade to curator 5.4.0 where CURATOR-645 is 
fixed (FLINK-29173).


> ZooKeeperLeaderElectionTest.testZooKeeperReelection timed out
> -------------------------------------------------------------
>
>                 Key: FLINK-30484
>                 URL: https://issues.apache.org/jira/browse/FLINK-30484
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.15.3
>            Reporter: Matthias Pohl
>            Priority: Major
>              Labels: test-stability
>
> {{ZooKeeperLeaderElectionTest.testZooKeeperReelection}} timed out in 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44161&view=logs&j=4d4a0d10-fca2-5507-8eed-c07f0bdf4887&t=7b25afdf-cc6c-566f-5459-359dc2585798&l=15416
> {code}
> Dec 22 05:00:20 "main" #1 prio=5 os_prio=0 tid=0x00007f1c7c00b800 nid=0x1ebdc 
> waiting on condition [0x00007f1c82b31000]
> Dec 22 05:00:20    java.lang.Thread.State: WAITING (parking)
> Dec 22 05:00:20       at sun.misc.Unsafe.park(Native Method)
> Dec 22 05:00:20       - parking to wait for  <0x000000008070b7c0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> Dec 22 05:00:20       at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> Dec 22 05:00:20       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> Dec 22 05:00:20       at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> Dec 22 05:00:20       at 
> org.apache.flink.runtime.leaderelection.TestingRetrievalBase.lambda$waitForNewLeader$0(TestingRetrievalBase.java:50)
> Dec 22 05:00:20       at 
> org.apache.flink.runtime.leaderelection.TestingRetrievalBase$$Lambda$310/1033917063.get(Unknown
>  Source)
> Dec 22 05:00:20       at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:144)
> Dec 22 05:00:20       at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:138)
> Dec 22 05:00:20       at 
> org.apache.flink.runtime.leaderelection.TestingRetrievalBase.waitForNewLeader(TestingRetrievalBase.java:48)
> Dec 22 05:00:20       at 
> org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionTest.testZooKeeperReelection(ZooKeeperLeaderElectionTest.java:238)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to