[jira] [Created] (HIVE-9007) Hive may generate wrong plan for map join queries due to IdentityProjectRemover [Spark Branch]

Chao (JIRA) Tue, 02 Dec 2014 10:41:58 -0800

Chao created HIVE-9007:
--------------------------

             Summary: Hive may generate wrong plan for map join queries due to 
IdentityProjectRemover [Spark Branch]
                 Key: HIVE-9007
                 URL: https://issues.apache.org/jira/browse/HIVE-9007
             Project: Hive
          Issue Type: Sub-task
          Components: Spark
    Affects Versions: spark-branch
            Reporter: Chao



HIVE-8435 introduces a new logical optimizer called IdentityProjectRemover, 
which may cause map join in spark branch to generate wrong plan.

Currently, the map join conversion in spark branch first goes through a method 
{{convertJoinMapJoin}}, which replaces a join op with a mapjoin op, removes RS 
associated with big table, and keep RSs for all small tables. Afterwards, in 
{{SparkReduceSinkMapJoinProc}} it replaces all parent RSs of the mapjoin op 
with HTS (note it doesn't check whether the RS belongs to small table or big 
table.)

The issue arises, when IdentityProjectRemover comes into play, which may result 
into a situation that a operator tree has two consecutive RSs. Imaging the 
following example:

{noformat}
          Join               MapJoin
          / \                /   \
        RS   RS   --->     RS     RS
       /      \           /         \
      TS       RS       TS          TS (big table)
                \      (small table)
                 TS
{noformat}

In this case, all parents of the mapjoin op will be RS, even the branch for big 
table! In {{SparkReduceSinkMapJoinProc}}, they will be replaced with HTS, which 
is obviously incorrect.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9007) Hive may generate wrong plan for map join queries due to IdentityProjectRemover [Spark Branch]

Reply via email to