[jira] [Commented] (SPARK-2973) Add a way to show tables without executing a job

Yin Huai (JIRA) Wed, 18 Feb 2015 09:47:24 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326247#comment-14326247
 ]


Yin Huai commented on SPARK-2973:
---------------------------------

I just tried our master. sql("show tables").collect() will not start a job. 
However, sql("show tables").take(1) will start a job because our overridden 
executeTake in ExecutedCommand will not be called in this case. 

The reason is that DataFrame.take(1) calls DataFrame.head(1) and then head 
calls limit(1).collect(). Inside limit, we create a DataFrame with 
Limit(Literal(1), ExecutedCommand(ShowTablesCommand)) as the logicalPlan. When 
we create the DataFrame for Limit, because ExecutedCommand is a command, we 
will create a LogicalRDD (see 
[here|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameImpl.scala#L77])
 and call queryExecution.toRDD of this ExecutedCommand. The queryExecution of 
sql("show tables").limit(1) will be
{code}
== Parsed Logical Plan ==
Limit 1
 LogicalRDD [tableName#10,isTemporary#11], ParallelCollectionRDD[7] at 
parallelize at commands.scala:65

== Analyzed Logical Plan ==
Limit 1
 LogicalRDD [tableName#10,isTemporary#11], ParallelCollectionRDD[7] at 
parallelize at commands.scala:65

== Optimized Logical Plan ==
Limit 1
 LogicalRDD [tableName#10,isTemporary#11], ParallelCollectionRDD[7] at 
parallelize at commands.scala:65

== Physical Plan ==
Limit 1
 PhysicalRDD [tableName#10,isTemporary#11], ParallelCollectionRDD[7] at 
parallelize at commands.scala:65
{code}

So, Limit.executeCollect will call PhysicalRDD.executeTake and then trigger a 
job execution.

> Add a way to show tables without executing a job
> ------------------------------------------------
>
>                 Key: SPARK-2973
>                 URL: https://issues.apache.org/jira/browse/SPARK-2973
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Aaron Davidson
>            Assignee: Cheng Lian
>            Priority: Critical
>             Fix For: 1.2.0
>
>
> Right now, sql("show tables").collect() will start a Spark job which shows up 
> in the UI. There should be a way to get these without this step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-2973) Add a way to show tables without executing a job

Reply via email to