[
https://issues.apache.org/jira/browse/HIVE-21690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vineet Garg updated HIVE-21690:
-------------------------------
Attachment: HIVE-21690.1.patch
> Support outer joins with HiveAggregateJoinTransposeRule and turn it on by
> default
> ---------------------------------------------------------------------------------
>
> Key: HIVE-21690
> URL: https://issues.apache.org/jira/browse/HIVE-21690
> Project: Hive
> Issue Type: Improvement
> Components: Query Planning
> Reporter: Vineet Garg
> Assignee: Vineet Garg
> Priority: Major
> Attachments: HIVE-21690.1.patch
>
>
> 1) This optimization is off by default. We would like to turn on this
> optimization wherein group by is pushed down to join, in some cases top
> aggregate is removed but in most of the cases this optimization adds extra
> aggregate nodes. To measure if those extra aggregates are beneficial or not
> (they might add extra overhead without reducing rows) cost is computed and
> compared b/w previous plan and new plan.
> Since Hive's cost model only consider JOIN's cost and discard cost of rest of
> the nodes, this comparison always favor new plan (since adding aggregate
> beneath join reduces the total number of rows processed by the join and
> therefore reduces the join cost). Therefore turning on this optimization with
> existing cost model is not a good idea.
> One approach to fix this is to localize the cost computation to the rule
> itself, i.e compute the non-cumulative cost of existing aggregate and join
> and compare it with new cost of new aggregates, join and top aggregate.
> Better approach in my opinion would be to fix the cost model and take
> aggregate cost into account (along with the join). This could affect other
> queries and can cause performance regression but those will most likely be
> issues with the planning and should be investigated and fixed.
> 2) This optimization currently only support INNER JOIN. This can be extended
> to support OUTER joins.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)