Zoltán Zvara created SPARK-9739:
-----------------------------------

             Summary: Execution visualizer
                 Key: SPARK-9739
                 URL: https://issues.apache.org/jira/browse/SPARK-9739
             Project: Spark
          Issue Type: Improvement
          Components: Web UI
            Reporter: Zoltán Zvara


Apache Spark, especially its user interface provided by the Web UI component 
lacks a tool that helps to understand the physical plan of the task scheduler 
and the possibility to monitor execution at a very low level, along with the 
communication triggered by data-flow and remote block-requests. We propose a 
tool that would allow users real-time monitoring and later to replay, examine 
job executions on any cluster currently supported by Spark.

The visualizer we implement would allow users to monitor Spark program’s 
data-flow at task level during execution in the current web user interface 
provided by the master. One would be able to see where executors, tasks get 
deployed on the cluster, along with communication triggered by tasks on a 
representative graph.

For this, we minimally modify Spark’s core to be able to collect information 
related to block requests. Slight modification and evident refactoring impacts 
the tasks’ code to allow reporting of execution state to the driver’s 
monitoring object, which has been added to SparkContext. Most aspect of the 
proposed module are configurable.

Our execution-visualizer would not raise any measurable performance impact on 
Spark programs, but would introduce the following benefits.

*Benefits*
We think the execution-visualizer would give the following benefits to 
end-users:
- understand the execution mechanism of Spark and demonstrate how executors, 
tasks work internally, which would attract new users;
- provided by the advanced visual monitoring of programs, the ability to 
discover issues of executors and tasks in a more detailed and convenient way;
- the possibility to highlight inefficient communication patterns of certain 
workflows, that would add insight to advanced optimization strategies.

*Implementation*
We modified tasks to send more detailed information to the driver before and 
after their effective work, which we collect as JSON on the driver’s file 
system. The logs would be read on every interval by the visualizer written 
using the D3 JavaScript library. The visualizer would provide the following 
main features:
- dynamically show hosts, executors, tasks currently running and finishing in a 
graph;
- show critical and additional backend information related to hosts, executors 
(along with available resources);
- show useful information about running tasks: RDD and split to compute, 
dependencies, stages and others;
- show failed executors and tasks;
- show task metrics and provide multiple ways to summarize;
- show communication as directed edges between executors in form of block 
requests;
- let the user to replay executions in a different speed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to