[ https://issues.apache.org/jira/browse/HIVE-13096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15160221#comment-15160221 ]
Jesus Camacho Rodriguez commented on HIVE-13096: ------------------------------------------------ [~ashutoshc], thanks for checking. In fact, I had missed that the {{getCumulativeCost}} method was overriden for Join operators (as it is part of _HiveRelMdDistinctRowCount_ ), thanks for catching that. However, the default {{getCumulativeCost}} is still applied over the rest of the operators (it is in _RelMdPercentageOriginalRows_). Hence, to mimic CBO cumulative cardinality estimation, we should combine both. I have uploaded a new patch with the updated method. > Cost to choose side table in MapJoin conversion based on cumulative > cardinality > ------------------------------------------------------------------------------- > > Key: HIVE-13096 > URL: https://issues.apache.org/jira/browse/HIVE-13096 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer > Affects Versions: 2.0.0, 2.1.0 > Reporter: Jesus Camacho Rodriguez > Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-13096.01.patch, HIVE-13096.patch > > > HIVE-11954 changed the logic to choose the side table in the MapJoin > conversion algorithm. Initial heuristic for the cost was based on number of > heavyweight operators. > This extends that work so the heuristic is based on accumulate cardinality. > In the future, we should choose the side based on total latency for the input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)