[ 
https://issues.apache.org/jira/browse/DRILL-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772590#comment-16772590
 ] 

Boaz Ben-Zvi edited comment on DRILL-7043 at 2/20/19 3:32 AM:
--------------------------------------------------------------

This enhancement is becoming more useful as our storage begins to support 
"sortedness" - e.g., Secondary Indexes, and future Parquet Metadata (e.g., 
taken from Hive). A Merge-Join on two sorted tables always out-performs a 
Hash-Join.

 

 

 


was (Author: ben-zvi):
This enhancement is becoming more useful as our storage begins to support 
"sortedness" - e.g., Secondary Indexes, and future Parquet Metadata (e.g., 
taken from Hive).

 

 

> Enhance Merge-Join to support Full Outer Join
> ---------------------------------------------
>
>                 Key: DRILL-7043
>                 URL: https://issues.apache.org/jira/browse/DRILL-7043
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Relational Operators, Query Planning & 
> Optimization
>    Affects Versions: 1.15.0
>            Reporter: Boaz Ben-Zvi
>            Assignee: Boaz Ben-Zvi
>            Priority: Major
>
>    Currently the Merge Join operator internally cannot support a Right Outer 
> Join (and thus a Full Outer Join; for ROJ alone, the planner rotates the 
> inputs and specifies a Left Outer Join).
>    The actual reason for not supporting ROJ is the current MJ implementation 
> - when a match is found, it puts a mark on the right side and iterates down 
> on the right, resetting back at the end (and on to the next left side entry). 
>  This would create an ambiguity if the next left entry is bigger than the 
> previous - is this an unmatched (i.e., need to return the right entry), or 
> there was a prior match (i.e., just advance to the next right).
>    Seems that adding a relevant flag to the persisted state ({{status}}) and 
> some other code changes would make the operator support Right-Outer-Join as 
> well (and thus a Full Outer Join).  The planner need an update as well - to 
> suggest the MJ in case of a FOJ, and maybe not to rotate the inputs in some 
> MJ cases.
>    Currently trying a FOJ with MJ (i.e. HJ disabled) produces the following 
> "no plan found" from Calcite:
> {noformat}
> 0: jdbc:drill:zk=local> select * from temp t1 full outer join temp2 t2 on 
> t1.d_date = t2.d_date;
> Error: SYSTEM ERROR: CannotPlanException: Node 
> [rel#2804:Subset#8.PHYSICAL.SINGLETON([]).[]] could not be implemented; 
> planner state:
> Root: rel#2804:Subset#8.PHYSICAL.SINGLETON([]).[]
> Original rel:
> DrillScreenRel(subset=[rel#2804:Subset#8.PHYSICAL.SINGLETON([]).[]]): 
> rowcount = 6.0, cumulative cost = {0.6000000000000001 rows, 
> 0.6000000000000001 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2802
>   DrillProjectRel(subset=[rel#2801:Subset#7.LOGICAL.ANY([]).[]], **=[$0], 
> **0=[$2]): rowcount = 6.0, cumulative cost = {6.0 rows, 12.0 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 2800
>     DrillJoinRel(subset=[rel#2799:Subset#6.LOGICAL.ANY([]).[]], 
> condition=[=($1, $3)], joinType=[full]): rowcount = 6.0, cumulative cost = 
> {10.0 rows, 104.0 cpu, 0.0 io, 0.0 network, 70.4 memory}, id = 2798
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to