[
https://issues.apache.org/jira/browse/TAJO-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyunsik Choi updated TAJO-742:
------------------------------
Description:
Currently, Tajo supports only equi-join. In contrast, theta-joins (not equality
join) are used in many real applications. We need to support theta-joins in
Tajo.
If a join condition includes anything other than equality predicate "=", we
call the join *theta join*. The predicates can be as follows:
* >, >=, <, <=, !=, LIKE. RLIKE, ...
Basically, some predicates can exploit hash shuffle, range shuffle, or other
thing. Other predicates requires that only one node processes all intermediate
data by using BNL. Also, if a join condition is a mixed of equi-join and
theta-join conditions, this join can make use of hash shuffle. This issue
requires some investigation.
This is an umbrella issue. We'll create subtasks.
was:
You can use various symbols for join conditions as follows:
||Symbol||Words||Example||
|=| equals|1 + 1 = 2|
|!=|not equal to|1 + 1 != 1|
|>|greater than|5 > 2|
|<| less than|7 > 9|
|>=|greater than or equal to|id >= 1|
|<=|less than or equal to|id <= 2|
But unfortunately, if you use other symbols besides _equals_ symbol for join
conditions, you can't find your expected result. Because HashJoinExec and
MergeJoinExec just focus on _equals_ sign.
Thus, we need to improve JoinOperators for various symbols. Additionally, we
also should update Shuffle and Hashing codes.
> Theta join support
> ------------------
>
> Key: TAJO-742
> URL: https://issues.apache.org/jira/browse/TAJO-742
> Project: Tajo
> Issue Type: Sub-task
> Components: physical operator
> Reporter: Jaehwa Jung
> Assignee: Jaehwa Jung
>
> Currently, Tajo supports only equi-join. In contrast, theta-joins (not
> equality join) are used in many real applications. We need to support
> theta-joins in Tajo.
> If a join condition includes anything other than equality predicate "=", we
> call the join *theta join*. The predicates can be as follows:
> * >, >=, <, <=, !=, LIKE. RLIKE, ...
> Basically, some predicates can exploit hash shuffle, range shuffle, or other
> thing. Other predicates requires that only one node processes all
> intermediate data by using BNL. Also, if a join condition is a mixed of
> equi-join and theta-join conditions, this join can make use of hash shuffle.
> This issue requires some investigation.
> This is an umbrella issue. We'll create subtasks.
--
This message was sent by Atlassian JIRA
(v6.2#6252)