[
https://issues.apache.org/jira/browse/HIVE-21690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833173#comment-16833173
]
Hive QA commented on HIVE-21690:
--------------------------------
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12967839/HIVE-21690.1.patch
{color:red}ERROR:{color} -1 due to no test(s) being added or modified.
{color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 15972 tests
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constant_prop_3]
(batchId=48)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[correlationoptimizer2]
(batchId=174)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_smb_ptf]
(batchId=167)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[mapjoin_hint]
(batchId=169)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[semijoin_reddedup]
(batchId=170)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
(batchId=171)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_select]
(batchId=171)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_scalar]
(batchId=130)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_select]
(batchId=130)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[cbo_query64]
(batchId=285)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query64]
(batchId=285)
org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[cbo_query30]
(batchId=285)
org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[cbo_query54]
(batchId=285)
org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[cbo_query64]
(batchId=285)
org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[cbo_query81]
(batchId=285)
org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[query30]
(batchId=285)
org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[query54]
(batchId=285)
org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[query64]
(batchId=285)
org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[query81]
(batchId=285)
{noformat}
Test results:
https://builds.apache.org/job/PreCommit-HIVE-Build/17118/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/17118/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-17118/
Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 19 tests failed
{noformat}
This message is automatically generated.
ATTACHMENT ID: 12967839 - PreCommit-HIVE-Build
> Support outer joins with HiveAggregateJoinTransposeRule and turn it on by
> default
> ---------------------------------------------------------------------------------
>
> Key: HIVE-21690
> URL: https://issues.apache.org/jira/browse/HIVE-21690
> Project: Hive
> Issue Type: Improvement
> Components: Query Planning
> Reporter: Vineet Garg
> Assignee: Vineet Garg
> Priority: Major
> Attachments: HIVE-21690.1.patch
>
>
> 1) This optimization is off by default. We would like to turn on this
> optimization wherein group by is pushed down to join, in some cases top
> aggregate is removed but in most of the cases this optimization adds extra
> aggregate nodes. To measure if those extra aggregates are beneficial or not
> (they might add extra overhead without reducing rows) cost is computed and
> compared b/w previous plan and new plan.
> Since Hive's cost model only consider JOIN's cost and discard cost of rest of
> the nodes, this comparison always favor new plan (since adding aggregate
> beneath join reduces the total number of rows processed by the join and
> therefore reduces the join cost). Therefore turning on this optimization with
> existing cost model is not a good idea.
> One approach to fix this is to localize the cost computation to the rule
> itself, i.e compute the non-cumulative cost of existing aggregate and join
> and compare it with new cost of new aggregates, join and top aggregate.
> Better approach in my opinion would be to fix the cost model and take
> aggregate cost into account (along with the join). This could affect other
> queries and can cause performance regression but those will most likely be
> issues with the planning and should be investigated and fixed.
> 2) This optimization currently only support INNER JOIN. This can be extended
> to support OUTER joins.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)