[
https://issues.apache.org/jira/browse/FLINK-24538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429345#comment-17429345
]
xmarker edited comment on FLINK-24538 at 10/15/21, 3:57 PM:
------------------------------------------------------------
I investigate the issue related code and i think the issue may be occur with
flowing scene:
1.When call `retrievalEventHandler.waitForNewLeader(timeout)` at line 434, in
TestingRetrievalBase.waitForNewLeader wait a correct leader information
2. But when return `leader.getLeaderAddress()` in TestingRetrievalBase the
leaderRetrievalDriver was notified a empty leader information(may be zookeeper
connection suspend or lost, see
ZooKeeperLeaderRetrievalDriver.handleStateChange)
3. So we can use a local variable to store leaderEventQueue.poll 's result in
case the object field variable change .
[~wangyang0918] do you have any good advice ?
was (Author: xmarker):
I investigate the issue related code and i think the issue may be occur with
flowing scene:
1.When call `retrievalEventHandler.waitForNewLeader(timeout)` at line 434, in
TestingRetrievalBase.waitForNewLeader wait a correct leader information
2. But when return `leader.getLeaderAddress()` in TestingRetrievalBase the
leaderRetrievalDriver was notified a empty leader information(may be zookeeper
connection suspend or lost, see
ZooKeeperLeaderRetrievalDriver.handleStateChange)
3. So we can add a lock in TestingRetrievalBase when change it's leader
information
[~wangyang0918] do you have any good advice ?
> ZooKeeperLeaderElectionTest.testLeaderShouldBeCorrectedWhenOverwritten fails
> with NPE
> -------------------------------------------------------------------------------------
>
> Key: FLINK-24538
> URL: https://issues.apache.org/jira/browse/FLINK-24538
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.14.0
> Reporter: Xintong Song
> Priority: Major
> Labels: test-stability
> Fix For: 1.15.0, 1.14.1
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=25020&view=logs&j=f2b08047-82c3-520f-51ee-a30fd6254285&t=3810d23d-4df2-586c-103c-ec14ede6af00&l=7573
> {code}
> Oct 13 22:26:04 [ERROR] Tests run: 8, Failures: 0, Errors: 1, Skipped: 0,
> Time elapsed: 12.355 s <<< FAILURE! - in
> org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionTest
> Oct 13 22:26:04 [ERROR] testLeaderShouldBeCorrectedWhenOverwritten Time
> elapsed: 1.138 s <<< ERROR!
> Oct 13 22:26:04 java.lang.NullPointerException
> Oct 13 22:26:04 at
> org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionTest.testLeaderShouldBeCorrectedWhenOverwritten(ZooKeeperLeaderElectionTest.java:434)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)