[
https://issues.apache.org/jira/browse/HIVE-13096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15160221#comment-15160221
]
Jesus Camacho Rodriguez commented on HIVE-13096:
------------------------------------------------
[~ashutoshc], thanks for checking.
In fact, I had missed that the {{getCumulativeCost}} method was overriden for
Join operators (as it is part of _HiveRelMdDistinctRowCount_ ), thanks for
catching that. However, the default {{getCumulativeCost}} is still applied over
the rest of the operators (it is in _RelMdPercentageOriginalRows_).
Hence, to mimic CBO cumulative cardinality estimation, we should combine both.
I have uploaded a new patch with the updated method.
> Cost to choose side table in MapJoin conversion based on cumulative
> cardinality
> -------------------------------------------------------------------------------
>
> Key: HIVE-13096
> URL: https://issues.apache.org/jira/browse/HIVE-13096
> Project: Hive
> Issue Type: Bug
> Components: Physical Optimizer
> Affects Versions: 2.0.0, 2.1.0
> Reporter: Jesus Camacho Rodriguez
> Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13096.01.patch, HIVE-13096.patch
>
>
> HIVE-11954 changed the logic to choose the side table in the MapJoin
> conversion algorithm. Initial heuristic for the cost was based on number of
> heavyweight operators.
> This extends that work so the heuristic is based on accumulate cardinality.
> In the future, we should choose the side based on total latency for the input.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)