[ 
https://issues.apache.org/jira/browse/FLINK-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-15169:
----------------------------
    Description: 
WebUI relies on {{ExecutionGraph#failureInfo}} and {{Execution#failureCause}} 
to generate error info (via {{JobExceptionsHandler#createJobExceptionsInfo}}). 
Errors happen in the scheduling of DefaultScheduler are not recorded into those 
fields, thus cannot be shown to users in WebUI (nor via REST queries).

To solve it, 
1. global failures should be recorded into {{ExecutionGraph#failureInfo}}, via 
{{ExecutionGraph#initFailureCause}} which can be exposed as 
{{SchedulerBase#initFailureCause}}.
2. for task failures, one solution I can think of is to avoid invoking 
{{DefaultScheduler#handleTaskFailure}} directly on scheduler's internal 
failures. Instead, we can introduce 
{{ExecutionVertexOperations#fail(ExecutionVertex)}} to hand the error to 
{{ExecutionVertex}} as a common failure.

cc [~gjy]

  was:
WebUI relies on {{ExecutionGraph#failureInfo}} and {{Execution#failureCause}} 
to generate error info (vis {{JobExceptionsHandler#createJobExceptionsInfo}}). 
Errors happen in the scheduling of DefaultScheduler are not recorded into those 
fields, thus cannot be shown to users in WebUI (nor via REST queries).

To solve it, 
1. global failures should be recorded into {{ExecutionGraph#failureInfo}}, via 
{{ExecutionGraph#initFailureCause}} which can be exposed as 
{{SchedulerBase#initFailureCause}}.
2. for task failures, one solution I can think of is to avoid invoking 
{{DefaultScheduler#handleTaskFailure}} directly on scheduler's internal 
failures. Instead, we can introduce 
{{ExecutionVertexOperations#fail(ExecutionVertex)}} to hand the error to 
{{ExecutionVertex}} as a common failure.

cc [~gjy]


> Errors happen in the scheduling of DefaultScheduler is not shown in WebUI
> -------------------------------------------------------------------------
>
>                 Key: FLINK-15169
>                 URL: https://issues.apache.org/jira/browse/FLINK-15169
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.10.0
>            Reporter: Zhu Zhu
>            Priority: Blocker
>             Fix For: 1.10.0
>
>
> WebUI relies on {{ExecutionGraph#failureInfo}} and {{Execution#failureCause}} 
> to generate error info (via 
> {{JobExceptionsHandler#createJobExceptionsInfo}}). 
> Errors happen in the scheduling of DefaultScheduler are not recorded into 
> those fields, thus cannot be shown to users in WebUI (nor via REST queries).
> To solve it, 
> 1. global failures should be recorded into {{ExecutionGraph#failureInfo}}, 
> via {{ExecutionGraph#initFailureCause}} which can be exposed as 
> {{SchedulerBase#initFailureCause}}.
> 2. for task failures, one solution I can think of is to avoid invoking 
> {{DefaultScheduler#handleTaskFailure}} directly on scheduler's internal 
> failures. Instead, we can introduce 
> {{ExecutionVertexOperations#fail(ExecutionVertex)}} to hand the error to 
> {{ExecutionVertex}} as a common failure.
> cc [~gjy]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to