[jira] [Commented] (FLINK-17769) Wrong order of log events on a task failure

Piotr Nowojski (Jira) Fri, 05 Jun 2020 00:56:26 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-17769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126505#comment-17126505
 ]


Piotr Nowojski commented on FLINK-17769:
----------------------------------------

1.
I think as you wrote option is not a good one because of duplicated logging 
problems.
2.
problem will be that we do not clean up all of the resources. In this method we 
are supposed to clean up everything that we can, regardless of the errors.
3.
Maybe if there was no other way.

One remark, I think the problem might a bit more common. There are other places 
that are logging errors in {{cleanUpInvoke}}. 

What about an option 4. Remember the first exception and suppress the later 
ones similar how {{TaskExecutor#stopTaskExecutorServices}} is doing for 
example? Keep in mind that an exception thrown from {{cleanUpInvoke}} is 
already subject to a similar logic in {{StreamTask#invoke}} if 
{{runMailboxLoop}} or {{afterInvoke}} has thrown some exception. So if there 
was a previous exception thrown during normal execution, an error thrown from 
{{cleanUpInvoke}} would be suppressed.


> Wrong order of log events on a task failure
> -------------------------------------------
>
>                 Key: FLINK-17769
>                 URL: https://issues.apache.org/jira/browse/FLINK-17769
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Task
>            Reporter: Robert Metzger
>            Priority: Critical
>             Fix For: 1.11.0
>
>
> In this example, errors from the {{close()}} method call are logged before 
> the {{switched from RUNNING to FAILED}} log line with the actual exception 
> (which is confusing, because the exceptions coming from {{close()}} could be 
> considered as the failure root cause, because they are first in the log)
> {code}
> 2020-05-14 10:12:42,660 INFO  
> org.apache.flink.streaming.connectors.kinesis.FlinkKinesisProducer [] - 
> Started Kinesis producer instance for region 'eu-central-1'
> 2020-05-14 10:12:42,660 DEBUG 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure [] - 
> Creating operator state backend for 
> StreamSource_cbc357ccb763df2852fee8c4fc7d55f2_(1/1) with empty state.
> 2020-05-14 10:12:42,823 INFO  
> org.apache.flink.streaming.connectors.kinesis.FlinkKinesisProducer [] - 
> Closing producer
> 2020-05-14 10:12:42,823 INFO  
> org.apache.flink.streaming.connectors.kinesis.FlinkKinesisProducer [] - 
> Flushing outstanding 2 records
> 2020-05-14 10:12:42,826 ERROR 
> org.apache.flink.streaming.runtime.tasks.StreamTask          [] - Error 
> during disposal of stream operator.
> org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.DaemonException:
>  The child process has been shutdown and can no longer accept messages.
> 2020-05-14 10:12:42,834 WARN  org.apache.flink.runtime.taskmanager.Task       
>              [] - Source: Custom Source -> Sink: Unnamed (1/1) 
> (4a49aea047aeb3e67cf79c788df0e558) switched from RUNNING to FAILED.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-17769) Wrong order of log events on a task failure

Reply via email to