Repository: spark
Updated Branches:
  refs/heads/master 5c80643d1 -> 78cb08a5d


[SPARK-5404] [SQL] Update the default statistic number

By default, the statistic for logical plan with multiple children is quite 
aggressive, and those statistic are quite critical for the join optimization, 
hence we need to estimate the statistics as accurate as possible.

For `Union`, which has 2 children, and overwrite the default implementation by 
`adding` its children `byteInSize` instead of `multiplying`.
For `Expand`, which only has a single child, but it will grows the size, and we 
need to multiply its inflating factor.

Author: Cheng Hao <[email protected]>

Closes #4914 from chenghao-intel/statistic and squashes the following commits:

d466bbc [Cheng Hao] Update the default statistic


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/78cb08a5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/78cb08a5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/78cb08a5

Branch: refs/heads/master
Commit: 78cb08a5db7b3e1b61ffb28bc95d0b23e8db5c40
Parents: 5c80643
Author: Cheng Hao <[email protected]>
Authored: Tue Mar 17 19:32:38 2015 -0700
Committer: Michael Armbrust <[email protected]>
Committed: Tue Mar 17 19:32:38 2015 -0700

----------------------------------------------------------------------
 .../sql/catalyst/plans/logical/basicOperators.scala     | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/78cb08a5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
----------------------------------------------------------------------
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
index 20cc8e9..624912d 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
@@ -81,6 +81,11 @@ case class Union(left: LogicalPlan, right: LogicalPlan) 
extends BinaryNode {
   override lazy val resolved =
     childrenResolved &&
     !left.output.zip(right.output).exists { case (l,r) => l.dataType != 
r.dataType }
+
+  override def statistics: Statistics = {
+    val sizeInBytes = left.statistics.sizeInBytes + 
right.statistics.sizeInBytes
+    Statistics(sizeInBytes = sizeInBytes)
+  }
 }
 
 case class Join(
@@ -174,7 +179,12 @@ case class Aggregate(
 case class Expand(
     projections: Seq[GroupExpression],
     output: Seq[Attribute],
-    child: LogicalPlan) extends UnaryNode
+    child: LogicalPlan) extends UnaryNode {
+  override def statistics: Statistics = {
+    val sizeInBytes = child.statistics.sizeInBytes * projections.length
+    Statistics(sizeInBytes = sizeInBytes)
+  }
+}
 
 trait GroupingAnalytics extends UnaryNode {
   self: Product =>


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to