[jira] [Commented] (FLINK-22545) JVM crashes when runing OperatorEventSendingCheckpointITCase.testOperatorEventAckLost

Stephan Ewen (Jira) Tue, 17 Aug 2021 10:03:15 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-22545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17400492#comment-17400492
 ]


Stephan Ewen commented on FLINK-22545:
--------------------------------------

The original exception raised and causing the test instability is the following:

The testing enumerator delays split assignment until after the next checkpoint 
is triggered. By the time the assignment operation is executed (in the 
mailbox), however, the target task might not be running any more, which gives 
us below exception.

{code}
22:32:15,333 [SourceCoordinator-Source: numbers -> Map -> Sink: Data stream 
collect sink] ERROR 
org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext [] - 
Uncaught Exception in Source Coordinator Executor
java.lang.IllegalArgumentException: Cannot assign splits null to subtask 0 
because the subtask is not registered.
        at 
org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.lambda$assignSplits$3(SourceCoordinatorContext.java:182)
 ~[flink-runtime-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
        at 
org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.callInCoordinatorThread(SourceCoordinatorContext.java:397)
 ~[flink-runtime-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
        at 
org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.assignSplits(SourceCoordinatorContext.java:176)
 ~[flink-runtime-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
        at 
org.apache.flink.api.connector.source.SplitEnumeratorContext.assignSplit(SplitEnumeratorContext.java:82)
 ~[flink-core-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
        at 
org.apache.flink.api.connector.source.lib.util.IteratorSourceEnumerator.handleSplitRequest(IteratorSourceEnumerator.java:63)
 ~[flink-core-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
        at 
org.apache.flink.runtime.operators.coordination.OperatorEventSendingCheckpointITCase$AssignAfterCheckpointEnumerator.fullFillPendingRequests(OperatorEventSendingCheckpointITCase.java:311)
 ~[test-classes/:?]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_292]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_292]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
{code}

I am fixing the test in this issue and opening a separate issue to downgrade 
the handling of the asynchronous enumerator exceptions to a global failure, not 
a process kill: FLINK-23843

> JVM crashes when runing 
> OperatorEventSendingCheckpointITCase.testOperatorEventAckLost
> -------------------------------------------------------------------------------------
>
>                 Key: FLINK-22545
>                 URL: https://issues.apache.org/jira/browse/FLINK-22545
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination, Tests
>    Affects Versions: 1.12.3
>            Reporter: Guowei Ma
>            Assignee: Stephan Ewen
>            Priority: Critical
>              Labels: pull-request-available, test-stability
>             Fix For: 1.14.0, 1.12.6, 1.13.3
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=17501&view=logs&j=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3&t=a99e99c7-21cd-5a1f-7274-585e62b72f56&l=4287



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-22545) JVM crashes when runing OperatorEventSendingCheckpointITCase.testOperatorEventAckLost

Reply via email to