spark git commit: [SPARK-15636][SQL] Make aggregate expressions more concise in explain

yhuai Sat, 28 May 2016 14:16:07 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 a2f68ded2 -> f3570bcea



[SPARK-15636][SQL] Make aggregate expressions more concise in explain

## What changes were proposed in this pull request?
This patch reduces the verbosity of aggregate expressions in explain (but does 
not actually remove any information). As an example, for the following command:
```
spark.range(10).selectExpr("sum(id) + 1", "count(distinct id)").explain(true)
```

Output before this patch:
```
== Physical Plan ==
*TungstenAggregate(key=[], 
functions=[(sum(id#0L),mode=Final,isDistinct=false),(count(id#0L),mode=Final,isDistinct=true)],
 output=[(sum(id) + 1)#3L,count(DISTINCT id)#16L])
+- Exchange SinglePartition, None
   +- *TungstenAggregate(key=[], 
functions=[(sum(id#0L),mode=PartialMerge,isDistinct=false),(count(id#0L),mode=Partial,isDistinct=true)],
 output=[sum#18L,count#21L])
      +- *TungstenAggregate(key=[id#0L], 
functions=[(sum(id#0L),mode=PartialMerge,isDistinct=false)], 
output=[id#0L,sum#18L])
         +- Exchange hashpartitioning(id#0L, 5), None
            +- *TungstenAggregate(key=[id#0L], 
functions=[(sum(id#0L),mode=Partial,isDistinct=false)], output=[id#0L,sum#18L])
               +- *Range (0, 10, splits=2)
```

Output after this patch:
```
== Physical Plan ==
*TungstenAggregate(key=[], functions=[sum(id#0L),count(distinct id#0L)], 
output=[(sum(id) + 1)#3L,count(DISTINCT id)#16L])
+- Exchange SinglePartition, None
   +- *TungstenAggregate(key=[], 
functions=[merge_sum(id#0L),partial_count(distinct id#0L)], 
output=[sum#18L,count#21L])
      +- *TungstenAggregate(key=[id#0L], functions=[merge_sum(id#0L)], 
output=[id#0L,sum#18L])
         +- Exchange hashpartitioning(id#0L, 5), None
            +- *TungstenAggregate(key=[id#0L], functions=[partial_sum(id#0L)], 
output=[id#0L,sum#18L])
               +- *Range (0, 10, splits=2)
```

Note the change from `(sum(id#0L),mode=PartialMerge,isDistinct=false)` to 
`merge_sum(id#0L)`.

In general aggregate explain is still very verbose, but further work will be 
done as follow-up pull requests.

## How was this patch tested?
Tested manually.

Author: Reynold Xin <[email protected]>

Closes #13367 from rxin/SPARK-15636.

(cherry picked from commit 472f16181d199684996a156b0e429bc525d65a57)
Signed-off-by: Yin Huai <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f3570bce
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f3570bce
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f3570bce

Branch: refs/heads/branch-2.0
Commit: f3570bcea697704f6f10fa62109300ce3cf6b28b
Parents: a2f68de
Author: Reynold Xin <[email protected]>
Authored: Sat May 28 14:14:36 2016 -0700
Committer: Yin Huai <[email protected]>
Committed: Sat May 28 14:15:15 2016 -0700

----------------------------------------------------------------------
 .../spark/sql/catalyst/expressions/Expression.scala  |  2 +-
 .../catalyst/expressions/aggregate/interfaces.scala  | 15 ++++++++++++++-
 2 files changed, 15 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/f3570bce/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
----------------------------------------------------------------------
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
index b4fe151..2ec4621 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
@@ -185,7 +185,7 @@ abstract class Expression extends TreeNode[Expression] {
    */
   def prettyName: String = nodeName.toLowerCase
 
-  private def flatArguments = productIterator.flatMap {
+  protected def flatArguments = productIterator.flatMap {
     case t: Traversable[_] => t
     case single => single :: Nil
   }

http://git-wip-us.apache.org/repos/asf/spark/blob/f3570bce/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
----------------------------------------------------------------------
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
index d31ccf9..504cea5 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
@@ -126,7 +126,14 @@ private[sql] case class AggregateExpression(
     AttributeSet(childReferences)
   }
 
-  override def toString: String = 
s"($aggregateFunction,mode=$mode,isDistinct=$isDistinct)"
+  override def toString: String = {
+    val prefix = mode match {
+      case Partial => "partial_"
+      case PartialMerge => "merge_"
+      case Final | Complete => ""
+    }
+    prefix + aggregateFunction.toAggString(isDistinct)
+  }
 
   override def sql: String = aggregateFunction.sql(isDistinct)
 }
@@ -203,6 +210,12 @@ sealed abstract class AggregateFunction extends Expression 
with ImplicitCastInpu
     val distinct = if (isDistinct) "DISTINCT " else ""
     s"$prettyName($distinct${children.map(_.sql).mkString(", ")})"
   }
+
+  /** String representation used in explain plans. */
+  def toAggString(isDistinct: Boolean): String = {
+    val start = if (isDistinct) "(distinct " else "("
+    prettyName + flatArguments.mkString(start, ", ", ")")
+  }
 }
 
 /**


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [SPARK-15636][SQL] Make aggregate expressions more concise in explain

Reply via email to