[
https://issues.apache.org/jira/browse/FLINK-22545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17400492#comment-17400492
]
Stephan Ewen commented on FLINK-22545:
--------------------------------------
The original exception raised and causing the test instability is the following:
The testing enumerator delays split assignment until after the next checkpoint
is triggered. By the time the assignment operation is executed (in the
mailbox), however, the target task might not be running any more, which gives
us below exception.
{code}
22:32:15,333 [SourceCoordinator-Source: numbers -> Map -> Sink: Data stream
collect sink] ERROR
org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext [] -
Uncaught Exception in Source Coordinator Executor
java.lang.IllegalArgumentException: Cannot assign splits null to subtask 0
because the subtask is not registered.
at
org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.lambda$assignSplits$3(SourceCoordinatorContext.java:182)
~[flink-runtime-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
at
org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.callInCoordinatorThread(SourceCoordinatorContext.java:397)
~[flink-runtime-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
at
org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.assignSplits(SourceCoordinatorContext.java:176)
~[flink-runtime-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
at
org.apache.flink.api.connector.source.SplitEnumeratorContext.assignSplit(SplitEnumeratorContext.java:82)
~[flink-core-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
at
org.apache.flink.api.connector.source.lib.util.IteratorSourceEnumerator.handleSplitRequest(IteratorSourceEnumerator.java:63)
~[flink-core-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
at
org.apache.flink.runtime.operators.coordination.OperatorEventSendingCheckpointITCase$AssignAfterCheckpointEnumerator.fullFillPendingRequests(OperatorEventSendingCheckpointITCase.java:311)
~[test-classes/:?]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_292]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_292]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
{code}
I am fixing the test in this issue and opening a separate issue to downgrade
the handling of the asynchronous enumerator exceptions to a global failure, not
a process kill: FLINK-23843
> JVM crashes when runing
> OperatorEventSendingCheckpointITCase.testOperatorEventAckLost
> -------------------------------------------------------------------------------------
>
> Key: FLINK-22545
> URL: https://issues.apache.org/jira/browse/FLINK-22545
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination, Tests
> Affects Versions: 1.12.3
> Reporter: Guowei Ma
> Assignee: Stephan Ewen
> Priority: Critical
> Labels: pull-request-available, test-stability
> Fix For: 1.14.0, 1.12.6, 1.13.3
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=17501&view=logs&j=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3&t=a99e99c7-21cd-5a1f-7274-585e62b72f56&l=4287
--
This message was sent by Atlassian Jira
(v8.3.4#803005)