[jira] [Commented] (HIVE-21690) Support outer joins with HiveAggregateJoinTransposeRule and turn it on by default

Hive QA (JIRA) Sat, 04 May 2019 16:03:32 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-21690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833173#comment-16833173
 ]


Hive QA commented on HIVE-21690:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12967839/HIVE-21690.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 15972 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constant_prop_3] 
(batchId=48)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[correlationoptimizer2]
 (batchId=174)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_smb_ptf]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[mapjoin_hint]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[semijoin_reddedup]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=171)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_select]
 (batchId=171)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_scalar] 
(batchId=130)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_select] 
(batchId=130)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[cbo_query64] 
(batchId=285)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query64] 
(batchId=285)
org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[cbo_query30]
 (batchId=285)
org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[cbo_query54]
 (batchId=285)
org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[cbo_query64]
 (batchId=285)
org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[cbo_query81]
 (batchId=285)
org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[query30]
 (batchId=285)
org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[query54]
 (batchId=285)
org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[query64]
 (batchId=285)
org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[query81]
 (batchId=285)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/17118/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/17118/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-17118/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 19 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12967839 - PreCommit-HIVE-Build

> Support outer joins with HiveAggregateJoinTransposeRule and turn it on by 
> default
> ---------------------------------------------------------------------------------
>
>                 Key: HIVE-21690
>                 URL: https://issues.apache.org/jira/browse/HIVE-21690
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Planning
>            Reporter: Vineet Garg
>            Assignee: Vineet Garg
>            Priority: Major
>         Attachments: HIVE-21690.1.patch
>
>
> 1) This optimization is off by default. We would like to turn on this 
> optimization wherein group by is pushed down to join, in some cases top 
> aggregate is removed but in most of the cases this optimization adds extra 
> aggregate nodes. To measure if those extra aggregates are beneficial or not 
> (they might add extra overhead without reducing rows) cost is computed and 
> compared b/w previous plan and new plan.
> Since Hive's cost model only consider JOIN's cost and discard cost of rest of 
> the nodes, this comparison always favor new plan (since adding aggregate 
> beneath join reduces the total number of rows processed by the join and 
> therefore reduces the join cost). Therefore turning on this optimization with 
> existing cost model is not a good idea.
> One approach to fix this is to localize the cost computation to the rule 
> itself, i.e compute the non-cumulative cost of existing aggregate and join 
> and compare it with new cost of new aggregates, join and top aggregate. 
> Better approach in my opinion would be to fix the cost model and take 
> aggregate cost into account (along with the join). This could affect other 
> queries and can cause performance regression but those will most likely be 
> issues with the planning and should be investigated and fixed.
> 2) This optimization currently only support INNER JOIN. This can be extended 
> to support OUTER joins.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21690) Support outer joins with HiveAggregateJoinTransposeRule and turn it on by default

Reply via email to