[ 
https://issues.apache.org/jira/browse/FLINK-25566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski updated FLINK-25566:
-----------------------------------
    Component/s: Runtime / Coordination
                     (was: Runtime / Task)

> Fail to cancel task if disk is bad for java.lang.NoClassDefFoundError
> ---------------------------------------------------------------------
>
>                 Key: FLINK-25566
>                 URL: https://issues.apache.org/jira/browse/FLINK-25566
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>            Reporter: Liu
>            Priority: Major
>         Attachments: image-2022-01-07-19-07-10-968.png, 
> image-2022-01-07-19-08-49-038.png, image-2022-01-07-19-11-39-448.png, 
> image-2022-01-13-10-45-02-495.png, image-2022-01-13-10-52-56-490.png, 
> image-2022-01-13-10-56-10-668.png, taskmanager.log
>
>
> When disk error, the related task will stuck for 
> java.lang.NoClassDefFoundError. Our inner flink version is 1.10.0 and we have 
> modified some code. The total log and related code is as following.  We will 
> analysis it with the code below the picture.
> !image-2022-01-13-10-45-02-495.png|width=1708,height=913!
> !image-2022-01-13-10-52-56-490.png|width=896,height=689!
> !image-2022-01-13-10-56-10-668.png|width=820,height=366!
> The process is as following:
>  # Disk error occurs.
>  # Exception is caught in Task' method doRun.
>  # When calling ExceptionUtils.isJvmFatalError(t), another exception 
> 'java.lang.NoClassDefFoundError: org/apache/flink/util/ExceptionUtils' is 
> thrown.
>  # notifyFatalError is called in TaskManagerRunner. I guess that the method 
> can not execute because that ExceptionUtils is not found.
>  # In Task, notifyFinalState is called finally. Since the state is not 
> transferred to failed, the log 'java.lang.IllegalStateException: null' is 
> printed.
> Maybe we should catch the exception such as NoClassDefFoundError and call 
> terminateJVM() finally.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to