[ 
https://issues.apache.org/jira/browse/FLINK-6042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265128#comment-17265128
 ] 

Till Rohrmann commented on FLINK-6042:
--------------------------------------

Thanks for the proposal [~mapohl]. I have a few comments

1) When doing restarts with the new scheduler, then we will recreate the 
{{ExecutionGraph}}. Hence, exposing these error infos on the 
{{AccessExecutionGraph}} might not work. 

2) {{UpdateSchedulerNgOnInternalFailuresListener}} will only be called if an 
exception on the JM occurs. If there is a normal task failure, then we will 
call {{updateTaskExecutionState}}

3) It would be great to group the exceptions wrt to their restart cycles in the 
web UI. So seeing the root causes for a restart and then being able to expand 
the view to see the task failures for this specific restart would be awesome.

> Display last n exceptions/causes for job restarts in Web UI
> -----------------------------------------------------------
>
>                 Key: FLINK-6042
>                 URL: https://issues.apache.org/jira/browse/FLINK-6042
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination, Runtime / Web Frontend
>    Affects Versions: 1.3.0
>            Reporter: Till Rohrmann
>            Assignee: Matthias
>            Priority: Major
>
> Users requested that it would be nice to see the last {{n}} exceptions 
> causing a job restart in the Web UI. This will help to more easily debug and 
> operate a job.
> We could store the root causes for failures similar to how prior executions 
> are stored in the {{ExecutionVertex}} using the {{EvictingBoundedList}} and 
> then serve this information via the Web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to