[ 
https://issues.apache.org/jira/browse/TAJO-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunsik Choi updated TAJO-742:
------------------------------

    Description: 
Currently, Tajo supports only equi-join. In contrast, theta-joins (not equality 
join) are used in many real applications. We need to support theta-joins in 
Tajo.

If a join condition includes anything other than equality predicate "=", we 
call the join *theta join*. The predicates can be as follows:
 * >, >=, <, <=, !=, LIKE. RLIKE, ...

Basically, some predicates can exploit hash shuffle, range shuffle, or other 
thing. Other predicates requires that only one node processes all intermediate 
data by using BNL. Also, if a join condition is a mixed of equi-join and 
theta-join conditions, this join can make use of hash shuffle. This issue 
requires some investigation. 

This is an umbrella issue. We'll create subtasks.

  was:
You can use various symbols for join conditions as follows:

||Symbol||Words||Example||
|=| equals|1 + 1 = 2|
|!=|not equal to|1 + 1 != 1|
|>|greater than|5 > 2|
|<| less than|7 > 9|
|>=|greater than or equal to|id >= 1|
|<=|less than or equal to|id <= 2|

But unfortunately, if you use other symbols besides _equals_ symbol for join 
conditions, you can't find your expected result. Because HashJoinExec and 
MergeJoinExec just focus on _equals_ sign. 

Thus, we need to improve JoinOperators for various symbols. Additionally, we 
also should update Shuffle and Hashing codes.


> Theta join support
> ------------------
>
>                 Key: TAJO-742
>                 URL: https://issues.apache.org/jira/browse/TAJO-742
>             Project: Tajo
>          Issue Type: Sub-task
>          Components: physical operator
>            Reporter: Jaehwa Jung
>            Assignee: Jaehwa Jung
>
> Currently, Tajo supports only equi-join. In contrast, theta-joins (not 
> equality join) are used in many real applications. We need to support 
> theta-joins in Tajo.
> If a join condition includes anything other than equality predicate "=", we 
> call the join *theta join*. The predicates can be as follows:
>  * >, >=, <, <=, !=, LIKE. RLIKE, ...
> Basically, some predicates can exploit hash shuffle, range shuffle, or other 
> thing. Other predicates requires that only one node processes all 
> intermediate data by using BNL. Also, if a join condition is a mixed of 
> equi-join and theta-join conditions, this join can make use of hash shuffle. 
> This issue requires some investigation. 
> This is an umbrella issue. We'll create subtasks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to