[
https://issues.apache.org/jira/browse/FLINK-6042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17269306#comment-17269306
]
Matthias commented on FLINK-6042:
---------------------------------
{quote}Any signature change to {{AccessExecutionGraph}} needed to let
{{ArchivedExecutionGraph}} return more {{ErrorInfos}} would also affect the
{{ExecutionGraph}}. Hence, it might be easier to introduce a {{JobInformation}}
object which is sent to the REST handlers. This object could contain the
{{AccessExecutionGraph}} and additional information (e.g. the exception
history).
{quote}
Coming back to the idea of having some kind of {{JobInformation}} collecting
the {{ArchivedExecutionGraph}} and additional information (e.g. the exception
grouping). I realized that this would mean that we have to support this
{{JobInformation}} in the {{ExecutionGraphCache}}.
{{AbstractExecutionGraphHandler}} is the abstraction layer for handling the
{{ExecutionGraphCache}} right now. It only passes the
{{ArchivedExecutionGraph}} instances through its interface type
{{AccessExecutionGraph}} to the implementing handlers (like
{{JobExceptionsHandler}}).
In order to make the {{JobInformation}} accessible within the
{{JobExceptionsHandler}}, we have to change the method signature of
{{AbstractExecutionGraphHandler.handleRequest(HandlerRequest,
AccessExecutionGraph)}} to also expose the {{JobInformation}} to the other
implementations of {{AbstractExecutionGraphHandler}}. This feels wrong as the
other implementations do not need to know about this wrapper class
{{JobInformation}}.
Hence, I wanted to revisit my initial proposal of extending the
{{ArchivedExecutionGraph}}. What about making {{ArchivedExecutionGraph}} an
interface extending {{AccessExecutionGraph}} by a method returning the
exception history. Additionally, we would have to change
{{ExecutionGraphCache}} to not return {{AccessExecutionGraph}} but
{{ArchiveExecutionGraph}} as a return type.
In the end, my proposal is not much different to your {{JobInformation}}
proposal. It just feels more natural to have the exception history be included
in the {{ArchivedExecutionGraph}} as well. [~trohrmann] what are your thoughts
on this?
> Display last n exceptions/causes for job restarts in Web UI
> -----------------------------------------------------------
>
> Key: FLINK-6042
> URL: https://issues.apache.org/jira/browse/FLINK-6042
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Coordination, Runtime / Web Frontend
> Affects Versions: 1.3.0
> Reporter: Till Rohrmann
> Assignee: Matthias
> Priority: Major
> Labels: pull-request-available
>
> Users requested that it would be nice to see the last {{n}} exceptions
> causing a job restart in the Web UI. This will help to more easily debug and
> operate a job.
> We could store the root causes for failures similar to how prior executions
> are stored in the {{ExecutionVertex}} using the {{EvictingBoundedList}} and
> then serve this information via the Web UI.
> _-- Update: January 21, 2021 --_
> The UI can already handle multiple exceptions through the Exception History.
> Right now, we list one or more exceptions which caused the job to fail.
> Instead, we could adapt it in a way that the history contains not only the
> exceptions of the most recent failure but one expandable entry per restart.
> If there are more than one exception connected to a single restart, we would
> list their stacktraces within one expandable entry.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)