[
https://issues.apache.org/jira/browse/FLINK-25566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Piotr Nowojski updated FLINK-25566:
-----------------------------------
Component/s: Runtime / Coordination
(was: Runtime / Task)
> Fail to cancel task if disk is bad for java.lang.NoClassDefFoundError
> ---------------------------------------------------------------------
>
> Key: FLINK-25566
> URL: https://issues.apache.org/jira/browse/FLINK-25566
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Coordination
> Reporter: Liu
> Priority: Major
> Attachments: image-2022-01-07-19-07-10-968.png,
> image-2022-01-07-19-08-49-038.png, image-2022-01-07-19-11-39-448.png,
> image-2022-01-13-10-45-02-495.png, image-2022-01-13-10-52-56-490.png,
> image-2022-01-13-10-56-10-668.png, taskmanager.log
>
>
> When disk error, the related task will stuck for
> java.lang.NoClassDefFoundError. Our inner flink version is 1.10.0 and we have
> modified some code. The total log and related code is as following. We will
> analysis it with the code below the picture.
> !image-2022-01-13-10-45-02-495.png|width=1708,height=913!
> !image-2022-01-13-10-52-56-490.png|width=896,height=689!
> !image-2022-01-13-10-56-10-668.png|width=820,height=366!
> The process is as following:
> # Disk error occurs.
> # Exception is caught in Task' method doRun.
> # When calling ExceptionUtils.isJvmFatalError(t), another exception
> 'java.lang.NoClassDefFoundError: org/apache/flink/util/ExceptionUtils' is
> thrown.
> # notifyFatalError is called in TaskManagerRunner. I guess that the method
> can not execute because that ExceptionUtils is not found.
> # In Task, notifyFinalState is called finally. Since the state is not
> transferred to failed, the log 'java.lang.IllegalStateException: null' is
> printed.
> Maybe we should catch the exception such as NoClassDefFoundError and call
> terminateJVM() finally.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)