[
https://issues.apache.org/jira/browse/FLINK-17769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126514#comment-17126514
]
Roman Khachatryan commented on FLINK-17769:
-------------------------------------------
Thanks for you analysis [~ym].
I think we could also collect disposal errors without logging on TM and send
them as part of failure reason to JM. It would simplify some things. But I'm
thinking now that for large deployments it's impractical.
I think we should log the reason before closing operators (something like the
1st option). I'd also increase log level in afterInvoke for "Finished task".
I also see this simple bug:
{code:java}
LOG.warn("{} ({}) switched from {} to {}.", taskNameWithSubtask, executionId,
currentState, newState, cause); LOG.warn("{} ({}) switched from {} to {}.",
taskNameWithSubtask, executionId, currentState, newState, cause);{code}
The last line is intended to also print the cause.
Maybe if we print the reason before closing operators, we should fix this
LOG.warn and replace cause with cause.getMessage.
> Wrong order of log events on a task failure
> -------------------------------------------
>
> Key: FLINK-17769
> URL: https://issues.apache.org/jira/browse/FLINK-17769
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Task
> Reporter: Robert Metzger
> Priority: Critical
> Fix For: 1.11.0
>
>
> In this example, errors from the {{close()}} method call are logged before
> the {{switched from RUNNING to FAILED}} log line with the actual exception
> (which is confusing, because the exceptions coming from {{close()}} could be
> considered as the failure root cause, because they are first in the log)
> {code}
> 2020-05-14 10:12:42,660 INFO
> org.apache.flink.streaming.connectors.kinesis.FlinkKinesisProducer [] -
> Started Kinesis producer instance for region 'eu-central-1'
> 2020-05-14 10:12:42,660 DEBUG
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure [] -
> Creating operator state backend for
> StreamSource_cbc357ccb763df2852fee8c4fc7d55f2_(1/1) with empty state.
> 2020-05-14 10:12:42,823 INFO
> org.apache.flink.streaming.connectors.kinesis.FlinkKinesisProducer [] -
> Closing producer
> 2020-05-14 10:12:42,823 INFO
> org.apache.flink.streaming.connectors.kinesis.FlinkKinesisProducer [] -
> Flushing outstanding 2 records
> 2020-05-14 10:12:42,826 ERROR
> org.apache.flink.streaming.runtime.tasks.StreamTask [] - Error
> during disposal of stream operator.
> org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.DaemonException:
> The child process has been shutdown and can no longer accept messages.
> 2020-05-14 10:12:42,834 WARN org.apache.flink.runtime.taskmanager.Task
> [] - Source: Custom Source -> Sink: Unnamed (1/1)
> (4a49aea047aeb3e67cf79c788df0e558) switched from RUNNING to FAILED.
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)