[jira] [Comment Edited] (FLINK-17769) Wrong order of log events on a task failure

Yuan Mei (Jira) Thu, 04 Jun 2020 23:32:26 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-17769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125674#comment-17125674
 ]


Yuan Mei edited comment on FLINK-17769 at 6/5/20, 6:31 AM:
-----------------------------------------------------------

*This is what I think is happening:*

The error msg of "Error during disposal of stream operator" is coming from 

 
{code:java}
StreamTask.disposeAllOperators(true);
|-StreamTask.cleanUpInvoke
   |-StreamTask.invoke
{code}
 

During StreamTask.invoke(), if anything is wrong, cleanUpInvoke() is called 
first, and then throw the `invokeException`(the real root cause).

The root cause is handled outside StreamTask and print after cleanUpInvoke()'s 
log is printed;

 

This is indeed quite confusing.

 

 


was (Author: ym):
*This is what I think is happening:*

The error msg of "Error during disposal of stream operator" is coming from 

 
{code:java}
StreamTask.disposeAllOperators(true);
|-StreamTask.cleanUpInvoke
   |-StreamTask.invoke
{code}
 

During StreamTask.invoke(), if anything is wrong, cleanUpInvoke() is called 
first, and then throw the `invokeException`(the real root cause).

The root cause is handled outside StreamTask and print after cleanUpInvoke()'s 
log is printed;

 

This is indeed quite confusing.

 
 # I was wondering whether stack print the `invokeException` before 
cleanUpInvoke() would help?
 # Are there any IT cases to reproduce this error to verify my understanding?

 

 

> Wrong order of log events on a task failure
> -------------------------------------------
>
>                 Key: FLINK-17769
>                 URL: https://issues.apache.org/jira/browse/FLINK-17769
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Task
>            Reporter: Robert Metzger
>            Priority: Critical
>             Fix For: 1.11.0
>
>
> In this example, errors from the {{close()}} method call are logged before 
> the {{switched from RUNNING to FAILED}} log line with the actual exception 
> (which is confusing, because the exceptions coming from {{close()}} could be 
> considered as the failure root cause, because they are first in the log)
> {code}
> 2020-05-14 10:12:42,660 INFO  
> org.apache.flink.streaming.connectors.kinesis.FlinkKinesisProducer [] - 
> Started Kinesis producer instance for region 'eu-central-1'
> 2020-05-14 10:12:42,660 DEBUG 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure [] - 
> Creating operator state backend for 
> StreamSource_cbc357ccb763df2852fee8c4fc7d55f2_(1/1) with empty state.
> 2020-05-14 10:12:42,823 INFO  
> org.apache.flink.streaming.connectors.kinesis.FlinkKinesisProducer [] - 
> Closing producer
> 2020-05-14 10:12:42,823 INFO  
> org.apache.flink.streaming.connectors.kinesis.FlinkKinesisProducer [] - 
> Flushing outstanding 2 records
> 2020-05-14 10:12:42,826 ERROR 
> org.apache.flink.streaming.runtime.tasks.StreamTask          [] - Error 
> during disposal of stream operator.
> org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.DaemonException:
>  The child process has been shutdown and can no longer accept messages.
> 2020-05-14 10:12:42,834 WARN  org.apache.flink.runtime.taskmanager.Task       
>              [] - Source: Custom Source -> Sink: Unnamed (1/1) 
> (4a49aea047aeb3e67cf79c788df0e558) switched from RUNNING to FAILED.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-17769) Wrong order of log events on a task failure

Reply via email to