[ 
https://issues.apache.org/jira/browse/HIVE-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15674667#comment-15674667
 ] 

Ashutosh Chauhan commented on HIVE-15234:
-----------------------------------------

There are two ways to address this:
* Get rid of {{leftSemiJoin}} field in Join and make stats computation logic to 
work on HiveSemiJoin.
* Get rid of HiveSemiJoin and make all the rules work on {{leftSemiJoin}} field 
of Join.

I initially went with approach 2) but quickly found out (all?) current calcite 
rules work correctly with SemiJoin, but dont (and cannot) take into account 
field hidden in HiveJoin. RelFieldTrimmer, filterJointranspose were I found, 
but I assume it would be true for many other rules, since otherwise we would 
get exception on current master. Thus I think option 1) is better here. Also, 
because function signatures will force a dev to handle HiveSemiJoin, but a 
field hidden in Join rel node won't. Thus having explicit SemiJoin is more 
robust.

> Semijoin cardinality estimation can be improved
> -----------------------------------------------
>
>                 Key: HIVE-15234
>                 URL: https://issues.apache.org/jira/browse/HIVE-15234
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO, Logical Optimizer
>    Affects Versions: 2.0.0, 2.1.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>
> Currently calcite optimization rules rely on (Hive)SemiJoin to represent semi 
> join node, whereas Stats estimate use {{leftSemiJoin}} field of Join to 
> estimate stats. As a result semi-join specific stats calculation logic is 
> never hit since at plan generation time HiveSemiJoin is created and 
> leftSemiJoin field of Join is never set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to