This is an automated email from the ASF dual-hosted git repository.
wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new cd2ef9c [SPARK-35567][SQL] Fix: Explain cost is not showing
statistics for all the nodes
cd2ef9c is described below
commit cd2ef9cb43f4e27302906ac2f605df4b66a72a2f
Author: shahid <[email protected]>
AuthorDate: Tue Jun 1 00:55:29 2021 +0800
[SPARK-35567][SQL] Fix: Explain cost is not showing statistics for all the
nodes
### What changes were proposed in this pull request?
Explain cost command in spark currently doesn't show statistics for all the
nodes. It misses some nodes in almost all the TPCDS queries.
In this PR, we are collecting all the plan nodes including the subqueries
and computing the statistics for each node, if it doesn't exists in stats
cache,
### Why are the changes needed?
**Before Fix**
For eg: Query1, Project node doesn't have statistics

Query15, Aggregate node doesn't have statistics

**After Fix:**
Query1:

Query 15:

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Manual testing
Closes #32704 from shahidki31/shahid/fixshowstats.
Authored-by: shahid <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
---
.../org/apache/spark/sql/execution/QueryExecution.scala | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)
diff --git
a/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
b/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
index 0795776..247baea 100644
---
a/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
+++
b/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
@@ -28,7 +28,6 @@ import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{AnalysisException, Row, SparkSession}
import org.apache.spark.sql.catalyst.{InternalRow, QueryPlanningTracker}
import org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker
-import org.apache.spark.sql.catalyst.expressions.SubqueryExpression
import org.apache.spark.sql.catalyst.expressions.codegen.ByteCodeStats
import org.apache.spark.sql.catalyst.plans.QueryPlan
import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, ReturnAnswer}
@@ -256,13 +255,12 @@ class QueryExecution(
// trigger to compute stats for logical plans
try {
- optimizedPlan.foreach(_.expressions.foreach(_.foreach {
- case subqueryExpression: SubqueryExpression =>
- // trigger subquery's child plan stats propagation
- subqueryExpression.plan.stats
- case _ =>
- }))
- optimizedPlan.stats
+ // This will trigger to compute stats for all the nodes in the plan,
including subqueries,
+ // if the stats doesn't exist in the statsCache and update the
statsCache corresponding
+ // to the node.
+ optimizedPlan.collectWithSubqueries {
+ case plan => plan.stats
+ }
} catch {
case e: AnalysisException => append(e.toString + "\n")
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]