[
https://issues.apache.org/jira/browse/DRILL-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772590#comment-16772590
]
Boaz Ben-Zvi edited comment on DRILL-7043 at 2/20/19 3:32 AM:
--------------------------------------------------------------
This enhancement is becoming more useful as our storage begins to support
"sortedness" - e.g., Secondary Indexes, and future Parquet Metadata (e.g.,
taken from Hive). A Merge-Join on two sorted tables always out-performs a
Hash-Join.
was (Author: ben-zvi):
This enhancement is becoming more useful as our storage begins to support
"sortedness" - e.g., Secondary Indexes, and future Parquet Metadata (e.g.,
taken from Hive).
> Enhance Merge-Join to support Full Outer Join
> ---------------------------------------------
>
> Key: DRILL-7043
> URL: https://issues.apache.org/jira/browse/DRILL-7043
> Project: Apache Drill
> Issue Type: Improvement
> Components: Execution - Relational Operators, Query Planning &
> Optimization
> Affects Versions: 1.15.0
> Reporter: Boaz Ben-Zvi
> Assignee: Boaz Ben-Zvi
> Priority: Major
>
> Currently the Merge Join operator internally cannot support a Right Outer
> Join (and thus a Full Outer Join; for ROJ alone, the planner rotates the
> inputs and specifies a Left Outer Join).
> The actual reason for not supporting ROJ is the current MJ implementation
> - when a match is found, it puts a mark on the right side and iterates down
> on the right, resetting back at the end (and on to the next left side entry).
> This would create an ambiguity if the next left entry is bigger than the
> previous - is this an unmatched (i.e., need to return the right entry), or
> there was a prior match (i.e., just advance to the next right).
> Seems that adding a relevant flag to the persisted state ({{status}}) and
> some other code changes would make the operator support Right-Outer-Join as
> well (and thus a Full Outer Join). The planner need an update as well - to
> suggest the MJ in case of a FOJ, and maybe not to rotate the inputs in some
> MJ cases.
> Currently trying a FOJ with MJ (i.e. HJ disabled) produces the following
> "no plan found" from Calcite:
> {noformat}
> 0: jdbc:drill:zk=local> select * from temp t1 full outer join temp2 t2 on
> t1.d_date = t2.d_date;
> Error: SYSTEM ERROR: CannotPlanException: Node
> [rel#2804:Subset#8.PHYSICAL.SINGLETON([]).[]] could not be implemented;
> planner state:
> Root: rel#2804:Subset#8.PHYSICAL.SINGLETON([]).[]
> Original rel:
> DrillScreenRel(subset=[rel#2804:Subset#8.PHYSICAL.SINGLETON([]).[]]):
> rowcount = 6.0, cumulative cost = {0.6000000000000001 rows,
> 0.6000000000000001 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2802
> DrillProjectRel(subset=[rel#2801:Subset#7.LOGICAL.ANY([]).[]], **=[$0],
> **0=[$2]): rowcount = 6.0, cumulative cost = {6.0 rows, 12.0 cpu, 0.0 io, 0.0
> network, 0.0 memory}, id = 2800
> DrillJoinRel(subset=[rel#2799:Subset#6.LOGICAL.ANY([]).[]],
> condition=[=($1, $3)], joinType=[full]): rowcount = 6.0, cumulative cost =
> {10.0 rows, 104.0 cpu, 0.0 io, 0.0 network, 70.4 memory}, id = 2798
> {noformat}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)