[
https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028161#comment-16028161
]
Hive QA commented on HIVE-16757:
--------------------------------
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12870284/HIVE-16757.04.patch
{color:red}ERROR:{color} -1 due to no test(s) being added or modified.
{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10788 tests
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
(batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
(batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
(batchId=145)
{noformat}
Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5472/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5472/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5472/
Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}
This message is automatically generated.
ATTACHMENT ID: 12870284 - PreCommit-HIVE-Build
> Use of deprecated getRows() instead of new
> estimateRowCount(RelMetadataQuery..) has serious performance impact
> --------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-16757
> URL: https://issues.apache.org/jira/browse/HIVE-16757
> Project: Hive
> Issue Type: Bug
> Components: Query Planning
> Reporter: Remus Rusanu
> Assignee: Remus Rusanu
> Attachments: HIVE-16757.01.patch, HIVE-16757.02.patch,
> HIVE-16757.03.patch, HIVE-16757.04.patch
>
>
> Calling Calcite's {{RelMetadataQuery.instance()}} is very expensive because
> it places a new memoization cache on the stack. Hidden in the deperecated
> {{AbstractRelNode.getRows()}} call is a call to {{instance()}}. In hive we
> have a number of places where we're calling the deprecated {{getRows()}}
> instead of the new API {{estimateRowCount(RelMetadataQuery mq)}} which
> accepts the RelMetadataQuery, which most places we actually have it handy to
> pass. On looking at the a complex query (49 joins) there are 2995340 calls to
> {{AbstractRelNode.getRows}}, each one busting the current memoization cache
> away.
> Was: -On complex queries HiveRelMdRowCount.getRowCount can get called many
> times. since it does not memoize its result and the call is recursive, it
> results in an explosion of calls. for example a query with 49 joins, during
> join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets
> called 6442 as a top level call, but the recursivity exploded this to 501729
> calls. Memoization of the rezult would stop the recursion early. In my
> testing this reduced the join reordering time for said query from 11s to
> <1s..-
> Note there is no need for {{HiveRelMdRowCount}} memoization because the
> function is called in stacks similar to this:
> {code}
> at
> org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.getRowCount(HiveRelMdRowCount.java:66)
> at GeneratedMetadataHandler_RowCount.getRowCount_$
> at GeneratedMetadataHandler_RowCount.getRowCount
> at
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:204)
> at
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1865)
> at
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1739)
> {code}
> and {{GeneratedMetadataHandler_RowCount.getRowCount}} handles memoization.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)