[
https://issues.apache.org/jira/browse/SPARK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326247#comment-14326247
]
Yin Huai commented on SPARK-2973:
---------------------------------
I just tried our master. sql("show tables").collect() will not start a job.
However, sql("show tables").take(1) will start a job because our overridden
executeTake in ExecutedCommand will not be called in this case.
The reason is that DataFrame.take(1) calls DataFrame.head(1) and then head
calls limit(1).collect(). Inside limit, we create a DataFrame with
Limit(Literal(1), ExecutedCommand(ShowTablesCommand)) as the logicalPlan. When
we create the DataFrame for Limit, because ExecutedCommand is a command, we
will create a LogicalRDD (see
[here|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameImpl.scala#L77])
and call queryExecution.toRDD of this ExecutedCommand. The queryExecution of
sql("show tables").limit(1) will be
{code}
== Parsed Logical Plan ==
Limit 1
LogicalRDD [tableName#10,isTemporary#11], ParallelCollectionRDD[7] at
parallelize at commands.scala:65
== Analyzed Logical Plan ==
Limit 1
LogicalRDD [tableName#10,isTemporary#11], ParallelCollectionRDD[7] at
parallelize at commands.scala:65
== Optimized Logical Plan ==
Limit 1
LogicalRDD [tableName#10,isTemporary#11], ParallelCollectionRDD[7] at
parallelize at commands.scala:65
== Physical Plan ==
Limit 1
PhysicalRDD [tableName#10,isTemporary#11], ParallelCollectionRDD[7] at
parallelize at commands.scala:65
{code}
So, Limit.executeCollect will call PhysicalRDD.executeTake and then trigger a
job execution.
> Add a way to show tables without executing a job
> ------------------------------------------------
>
> Key: SPARK-2973
> URL: https://issues.apache.org/jira/browse/SPARK-2973
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Reporter: Aaron Davidson
> Assignee: Cheng Lian
> Priority: Critical
> Fix For: 1.2.0
>
>
> Right now, sql("show tables").collect() will start a Spark job which shows up
> in the UI. There should be a way to get these without this step.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]