[jira] [Commented] (FLINK-17769) Wrong order of log events on a task failure

Piotr Nowojski (Jira) Fri, 05 Jun 2020 01:15:12 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-17769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126522#comment-17126522
 ]


Piotr Nowojski commented on FLINK-17769:
----------------------------------------

{quote}
I think we could also collect disposal errors without logging on TM and send 
them as part of failure reason to JM. It would simplify some things. But I'm 
thinking now that for large deployments it's impractical.
{quote}
This is a tough call. If we add the secondary failures from disposal as 
suppressed exceptions, we are risking braking some things - like end to end 
tests that are filtering what are expected/allowed error messages. Also we 
could pollute the JM logs more. On the other hand, some of those failures might 
be important indicating resource leaks or other problems.

> Wrong order of log events on a task failure
> -------------------------------------------
>
>                 Key: FLINK-17769
>                 URL: https://issues.apache.org/jira/browse/FLINK-17769
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Task
>            Reporter: Robert Metzger
>            Assignee: Yuan Mei
>            Priority: Critical
>             Fix For: 1.11.0
>
>
> In this example, errors from the {{close()}} method call are logged before 
> the {{switched from RUNNING to FAILED}} log line with the actual exception 
> (which is confusing, because the exceptions coming from {{close()}} could be 
> considered as the failure root cause, because they are first in the log)
> {code}
> 2020-05-14 10:12:42,660 INFO  
> org.apache.flink.streaming.connectors.kinesis.FlinkKinesisProducer [] - 
> Started Kinesis producer instance for region 'eu-central-1'
> 2020-05-14 10:12:42,660 DEBUG 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure [] - 
> Creating operator state backend for 
> StreamSource_cbc357ccb763df2852fee8c4fc7d55f2_(1/1) with empty state.
> 2020-05-14 10:12:42,823 INFO  
> org.apache.flink.streaming.connectors.kinesis.FlinkKinesisProducer [] - 
> Closing producer
> 2020-05-14 10:12:42,823 INFO  
> org.apache.flink.streaming.connectors.kinesis.FlinkKinesisProducer [] - 
> Flushing outstanding 2 records
> 2020-05-14 10:12:42,826 ERROR 
> org.apache.flink.streaming.runtime.tasks.StreamTask          [] - Error 
> during disposal of stream operator.
> org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.DaemonException:
>  The child process has been shutdown and can no longer accept messages.
> 2020-05-14 10:12:42,834 WARN  org.apache.flink.runtime.taskmanager.Task       
>              [] - Source: Custom Source -> Sink: Unnamed (1/1) 
> (4a49aea047aeb3e67cf79c788df0e558) switched from RUNNING to FAILED.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-17769) Wrong order of log events on a task failure

Reply via email to