[
https://issues.apache.org/jira/browse/FLINK-28078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556863#comment-17556863
]
Matthias Pohl commented on FLINK-28078:
---------------------------------------
{code}
16:17:07,802 [ForkJoinPool-45-worker-25] INFO
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl
[] - Starting
16:17:07,804 [ForkJoinPool-45-worker-25] INFO
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl
[] - Default schema
16:17:07,814 [ForkJoinPool-45-worker-25-EventThread] INFO
org.apache.flink.shaded.curator5.org.apache.curator.framework.state.ConnectionStateManager
[] - State change: CONNECTED
16:17:07,817 [ForkJoinPool-45-worker-25-EventThread] INFO
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker
[] - New config event received: {}
16:17:07,824 [Curator-ConnectionStateManager-0] DEBUG
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
[] - Connected to ZooKeeper quorum. Leader election can start.
16:17:07,824 [Curator-ConnectionStateManager-0] DEBUG
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
[] - Connected to ZooKeeper quorum. Leader election can start.
16:17:07,826 [ForkJoinPool-45-worker-25-EventThread] INFO
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker
[] - New config event received: {}
16:17:07,848 [ForkJoinPool-45-worker-25-EventThread] DEBUG
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
[] - ZooKeeperMultipleComponentLeaderElectionDriver obtained the leadership.
16:17:07,860 [ForkJoinPool-45-worker-25] INFO
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
[] - Closing ZooKeeperMultipleComponentLeaderElectionDriver.
{code}
The test itself usually creates three {{ElectionDriver}} instances and removes
them one by one through a for loop. The logs of the failed test reveal that
only two out of the three have the quorum connection established (i.e. the log
message {{Connected to ZooKeeper quorum. Leader election can start.}} is
printed). The first iteration picks the first instance, checks its leadership
and closes it. It looks like the second iteration picks the instance for which
the quorum connection is still not established. The leadership future could
therefore never be completed which results in the test getting stuck in the
{{join}} call.
> ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers
> runs into timeout
> ----------------------------------------------------------------------------------------------------------
>
> Key: FLINK-28078
> URL: https://issues.apache.org/jira/browse/FLINK-28078
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.16.0
> Reporter: Matthias Pohl
> Assignee: Matthias Pohl
> Priority: Major
> Labels: test-stability
>
> [Build
> #36189|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=36189&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=10455]
> got stuck in
> {{ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers}}
> {code}
> "ForkJoinPool-45-worker-25" #525 daemon prio=5 os_prio=0
> tid=0x00007fc74d9e3800 nid=0x62c8 waiting on condition [0x00007fc6ff2f2000]
> May 30 16:36:10 java.lang.Thread.State: WAITING (parking)
> May 30 16:36:10 at sun.misc.Unsafe.park(Native Method)
> May 30 16:36:10 - parking to wait for <0x00000000c2571b80> (a
> java.util.concurrent.CompletableFuture$Signaller)
> May 30 16:36:10 at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> May 30 16:36:10 at
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
> May 30 16:36:10 at
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3313)
> May 30 16:36:10 at
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
> May 30 16:36:10 at
> java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947)
> May 30 16:36:10 at
> org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers(ZooKeeperMultipleComponentLeaderElectionDriverTest.java:256)
> May 30 16:36:10 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> May 30 16:36:10 at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> May 30 16:36:10 at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> May 30 16:36:10 at java.lang.reflect.Method.invoke(Method.java:498)
> [...]
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)