Repository: spark Updated Branches: refs/heads/master 5c80643d1 -> 78cb08a5d
[SPARK-5404] [SQL] Update the default statistic number By default, the statistic for logical plan with multiple children is quite aggressive, and those statistic are quite critical for the join optimization, hence we need to estimate the statistics as accurate as possible. For `Union`, which has 2 children, and overwrite the default implementation by `adding` its children `byteInSize` instead of `multiplying`. For `Expand`, which only has a single child, but it will grows the size, and we need to multiply its inflating factor. Author: Cheng Hao <[email protected]> Closes #4914 from chenghao-intel/statistic and squashes the following commits: d466bbc [Cheng Hao] Update the default statistic Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/78cb08a5 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/78cb08a5 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/78cb08a5 Branch: refs/heads/master Commit: 78cb08a5db7b3e1b61ffb28bc95d0b23e8db5c40 Parents: 5c80643 Author: Cheng Hao <[email protected]> Authored: Tue Mar 17 19:32:38 2015 -0700 Committer: Michael Armbrust <[email protected]> Committed: Tue Mar 17 19:32:38 2015 -0700 ---------------------------------------------------------------------- .../sql/catalyst/plans/logical/basicOperators.scala | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/78cb08a5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---------------------------------------------------------------------- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala index 20cc8e9..624912d 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala @@ -81,6 +81,11 @@ case class Union(left: LogicalPlan, right: LogicalPlan) extends BinaryNode { override lazy val resolved = childrenResolved && !left.output.zip(right.output).exists { case (l,r) => l.dataType != r.dataType } + + override def statistics: Statistics = { + val sizeInBytes = left.statistics.sizeInBytes + right.statistics.sizeInBytes + Statistics(sizeInBytes = sizeInBytes) + } } case class Join( @@ -174,7 +179,12 @@ case class Aggregate( case class Expand( projections: Seq[GroupExpression], output: Seq[Attribute], - child: LogicalPlan) extends UnaryNode + child: LogicalPlan) extends UnaryNode { + override def statistics: Statistics = { + val sizeInBytes = child.statistics.sizeInBytes * projections.length + Statistics(sizeInBytes = sizeInBytes) + } +} trait GroupingAnalytics extends UnaryNode { self: Product => --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
