Hyunsik Choi created TAJO-926:
---------------------------------

             Summary: Join condition including column references of a 
row-preserving table in left outer join causes incorrect result
                 Key: TAJO-926
                 URL: https://issues.apache.org/jira/browse/TAJO-926
             Project: Tajo
          Issue Type: Bug
          Components: physical operator, planner/optimizer
            Reporter: Hyunsik Choi
            Assignee: Hyunsik Choi
             Fix For: 0.9.0


This patch fixes two bugs.

One is wrong projection push down (PPD). See the example, reproducing the bug:

{noformat}
select
  r_name,
  r_regionkey,
  n_name,
  n_regionkey
from
  region left outer join nation on n_regionkey = r_regionkey and r_name in 
('AMERICA', 'ASIA')
order by r_name;
{noformat}

The above query includes one left outer join (LOJ) and one join filter. Since 
this join filter {{R_NAME in ('AMERICA', 'ASIA')}} includes column references 
corresponding to the row preserved table {{region}}, the join filter is placed 
on the LOJ operator. It only results in the sub expression push down of 
RowConstantEval and replaces right expression of IN predicate by FieldEval. 
But, we assume that the RHS of InEval is always RowConstantEval. This is the 
main clause of this bug.

{noformat}
2014-07-09 16:39:37,527 ERROR: org.apache.tajo.worker.Task (run(395)) - 
org.apache.tajo.engine.eval.FieldEval cannot be cast to 
org.apache.tajo.engine.eval.RowConstantEval
java.lang.ClassCastException: org.apache.tajo.engine.eval.FieldEval cannot be 
cast to org.apache.tajo.engine.eval.RowConstantEval
        at org.apache.tajo.engine.eval.InEval.eval(InEval.java:62)
        at org.apache.tajo.engine.eval.BinaryEval.eval(BinaryEval.java:104)
        at 
org.apache.tajo.engine.planner.physical.NLLeftOuterJoinExec.next(NLLeftOuterJoinExec.java:109)
        at 
org.apache.tajo.engine.planner.physical.ExternalSortExec.sortAndStoreAllChunks(ExternalSortExec.java:201)
        at 
org.apache.tajo.engine.planner.physical.ExternalSortExec.next(ExternalSortExec.java:278)
        at 
org.apache.tajo.engine.planner.physical.RangeShuffleFileWriteExec.next(RangeShuffleFileWriteExec.java:99)
        at org.apache.tajo.worker.Task.run(Task.java:388)
        at org.apache.tajo.worker.TaskRunner$1.run(TaskRunner.java:406)
        at java.lang.Thread.run(Thread.java:744)
2014-07-09 16:39:37,528 INFO: org.apache.tajo.worker.TaskAttemptContext 
(setState(115)) - Query status of ta_1404891573341_0004_000003_000000_02 is 
changed to TA_FAILED
2014-07-09 16:39:37,529 INFO: org.apache.tajo.worker.Task (run(452)) - Worker's 
task counter - total:3, succeeded: 0, killed: 3, failed: 3
{noformat}

The second bug is that HashLeftOuterJoin results in wrong result when it has 
join filter corresponding to row preserved table like the above example query. 
In order to fix this bug, we have to skip the right iterator of hash table when 
if the joined tuple is filtered.

Expected:
{noformat}
r_name,r_regionkey,n_name,n_regionkey
-------------------------------
AFRICA,0,null,null
AMERICA,1,ARGENTINA,1
AMERICA,1,BRAZIL,1
AMERICA,1,CANADA,1
AMERICA,1,PERU,1
AMERICA,1,UNITED STATES,1
ASIA,2,INDIA,2
ASIA,2,INDONESIA,2
ASIA,2,JAPAN,2
ASIA,2,CHINA,2
ASIA,2,VIETNAM,2
EUROPE,3,null,null
MIDDLE EAST,4,null,null
{noformat}

Actual result:
{noformat}
r_name,r_regionkey,n_name,n_regionkey
-------------------------------
AFRICA,0,null,null
AFRICA,0,null,null
AFRICA,0,null,null
AFRICA,0,null,null
AFRICA,0,null,null
AMERICA,1,ARGENTINA,1
AMERICA,1,BRAZIL,1
AMERICA,1,CANADA,1
AMERICA,1,PERU,1
AMERICA,1,UNITED STATES,1
ASIA,2,INDIA,2
ASIA,2,INDONESIA,2
ASIA,2,JAPAN,2
ASIA,2,CHINA,2
ASIA,2,VIETNAM,2
EUROPE,3,null,null
EUROPE,3,null,null
EUROPE,3,null,null
EUROPE,3,null,null
EUROPE,3,null,null
MIDDLE EAST,4,null,null
MIDDLE EAST,4,null,null
MIDDLE EAST,4,null,null
MIDDLE EAST,4,null,null
MIDDLE EAST,4,null,null
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to