[jira] [Commented] (HIVE-18368) Improve Spark Debug RDD Graph

Rui Li (JIRA) Wed, 10 Jan 2018 00:51:49 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16319913#comment-16319913
 ]


Rui Li commented on HIVE-18368:
-------------------------------

Hi [~stakiar], two questions regarding the screenshot:
# Why the num of partitions of MapInput is 0?
# It seems confusing to have 2 RDDs having the same work name, e.g. "Reducer 
3", "Map 11". Can we name the shuffled RDD as "ShuffleTran", and the Hadoop RDD 
as "MapInput"?

> Improve Spark Debug RDD Graph
> -----------------------------
>
>                 Key: HIVE-18368
>                 URL: https://issues.apache.org/jira/browse/HIVE-18368
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>         Attachments: HIVE-18368.1.patch, HIVE-18368.2.patch, Spark UI - Named 
> RDDs.png
>
>
> The {{SparkPlan}} class does some logging to show the mapping between 
> different {{SparkTran}}, what shuffle types are used, and what trans are 
> cached. However, there is room for improvement.
> When debug logging is enabled the RDD graph is logged, but there isn't much 
> information printed about each RDD.
> We should combine both of the graphs and improve them. We could even make the 
> Spark Plan graph part of the {{EXPLAIN EXTENDED}} output.
> Ideally, the final graph shows a clear relationship between Tran objects, 
> RDDs, and BaseWorks. Edge should include information about number of 
> partitions, shuffle types, Spark operations used, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-18368) Improve Spark Debug RDD Graph

Reply via email to