[
https://issues.apache.org/jira/browse/FLINK-32049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17741903#comment-17741903
]
Weijie Guo commented on FLINK-32049:
------------------------------------
I have some suspicion that this is because
{{ChannelStateWriteRequestExecutorFactory#getOrCreateExecutor}} may have
obtained an already registered executor. That is to say, there is a possibility
that {{isRegistering}} has become false, but {{onRegistered}} has not been
called yet, so the executor is still not null.
I am not very familiar with related codes, so I cannot fully confirm this
argument. [~fanrui] Can you help confirm this? If there is indeed a
possibility, I'd like to fix this.
> CoordinatedSourceRescaleITCase.testDownscaling fails on AZP
> -----------------------------------------------------------
>
> Key: FLINK-32049
> URL: https://issues.apache.org/jira/browse/FLINK-32049
> Project: Flink
> Issue Type: Bug
> Components: Connectors / Common
> Affects Versions: 1.18.0, 1.17.1
> Reporter: Sergey Nuyanzin
> Assignee: Qingsheng Ren
> Priority: Critical
> Labels: test-stability
>
> CoordinatedSourceRescaleITCase.testDownscaling fails with
> {noformat}
> May 08 03:19:14 [ERROR] Failures:
> May 08 03:19:14 [ERROR]
> CoordinatedSourceRescaleITCase.testDownscaling:75->resumeCheckpoint:107
> May 08 03:19:14 Multiple Failures (1 failure)
> May 08 03:19:14 -- failure 1 --
> May 08 03:19:14 [Any cause contains message 'successfully restored
> checkpoint']
> May 08 03:19:14 Expecting any element of:
> May 08 03:19:14 [org.apache.flink.runtime.client.JobExecutionException: Job
> execution failed.
> May 08 03:19:14 at
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
> May 08 03:19:14 at
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
> May 08 03:19:14 at
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
> May 08 03:19:14 ...(45 remaining lines not displayed - this can be
> changed with Assertions.setMaxStackTraceElementsDisplayed),
> May 08 03:19:14 org.apache.flink.runtime.JobException: Recovery is
> suppressed by NoRestartBackoffTimeStrategy
> May 08 03:19:14 at
> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:139)
> May 08 03:19:14 at
> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:83)
> May 08 03:19:14 at
> org.apache.flink.runtime.scheduler.DefaultScheduler.recordTaskFailure(DefaultScheduler.java:258)
> May 08 03:19:14 ...(35 remaining lines not displayed - this can be
> changed with Assertions.setMaxStackTraceElementsDisplayed),
> May 08 03:19:14 java.lang.IllegalStateException: This executor has been
> registered.
> May 08 03:19:14 at
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:193)
> May 08 03:19:14 at
> org.apache.flink.runtime.checkpoint.channel.ChannelStateWriteRequestExecutorImpl.registerSubtask(ChannelStateWriteRequestExecutorImpl.java:341)
> May 08 03:19:14 at
> org.apache.flink.runtime.checkpoint.channel.ChannelStateWriteRequestExecutorFactory.getOrCreateExecutor(ChannelStateWriteRequestExecutorFactory.java:63)
> May 08 03:19:14 ...(17 remaining lines not displayed - this can be
> changed with Assertions.setMaxStackTraceElementsDisplayed)]
> May 08 03:19:14 to satisfy the given assertions requirements but none did:
> May 08 03:19:14
> May 08 03:19:14 org.apache.flink.runtime.client.JobExecutionException: Job
> execution failed.
> May 08 03:19:14 at
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
> May 08 03:19:14 at
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
> May 08 03:19:14 at
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
> May 08 03:19:14 ...(45 remaining lines not displayed - this can be
> changed with Assertions.setMaxStackTraceElementsDisplayed)
> May 08 03:19:14 error:
> May 08 03:19:14 Expecting throwable message:
> May 08 03:19:14 "Job execution failed."
> May 08 03:19:14 to contain:
> May 08 03:19:14 "successfully restored checkpoint"
> May 08 03:19:14 but did not.
> May 08 03:19:14
> {noformat}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=48772&view=logs&j=fc7981dc-d266-55b0-5fff-f0d0a2294e36&t=1a9b228a-3e0e-598f-fc81-c321539dfdbf&l=7191
--
This message was sent by Atlassian Jira
(v8.20.10#820010)