[ 
https://issues.apache.org/jira/browse/TAJO-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14558176#comment-14558176
 ] 

ASF GitHub Bot commented on TAJO-1553:
--------------------------------------

GitHub user jihoonson opened a pull request:

    https://github.com/apache/tajo/pull/583

    TAJO-1553: Improve broadcast join planning

    Sorry for the large patch, but most changes are related to unit tests.
    
    In this patch, I've added ```BroadcastJoinRule``` as a new 
```GlobalPlanRewriteRule```.
    BroadcastJoinRule converts repartition join plan into broadcast join plan. 
To describe the broadcast join rules, we have to define the ```broadcastable``` 
property for a relation as follows.
    
    _Broadcastable relation:_ A relation is broadcastable when its size is 
smaller than a given threshold.
    
    And I've assumed that if every input of an execution block is 
broadcastable, the output of the execution block is also broadcastable.
    
    Finally, here are the rules to convert repartition join into broadcast join.
    
    * Given an EB containing a join and its child EBs, those EBs can be merged 
into a single EB if at least one child EB's output is broadcastable.
    * Given a user-defined threshold, the total size of broadcast relations of 
an EB cannot exceed such threshold.
     * After merging EBs according to the first rule, the result EB may not 
satisfy the second rule. In this case, enforce repartition join for large 
relations to satisfy the second rule.
    * Preserved-row relations cannot be broadcasted to avoid duplicated 
results. That is, full outer join cannot be executed with broadcast join.
     * Here is brief backgrounds for this rule. Data of preserved-row relations 
will be appeared in the join result regardless of join conditions. If multiple 
tasks execute outer join with broadcasted preserved-row relations, they emit 
duplicates results.
     * Even though a single task can execute outer join when every input is 
broadcastable, broadcast join is not allowed if one of input relation consists 
of multiple files.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jihoonson/tajo-2 TAJO-1553

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tajo/pull/583.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #583
    
----
commit 7d72e8bef78abbeddb96dda20cd93937c0983f4a
Author: Jihoon Son <[email protected]>
Date:   2015-04-16T09:47:59Z

    TAJO-1553

commit 62f8ec79508cbc34273546d0e4103cd2d36348d4
Author: Jihoon Son <[email protected]>
Date:   2015-04-16T12:21:33Z

    TAJO-1553

commit a47a6025b753e2c2e4c8eb5b486657fd2b2c8d2f
Author: Jihoon Son <[email protected]>
Date:   2015-04-17T06:22:51Z

    TAJO-1553

commit 1dfee64762eacd2c2ed1f1e82fd91e6b721a2a91
Author: Jihoon Son <[email protected]>
Date:   2015-04-20T05:09:54Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into 
TAJO-1553

commit 5feaeac20bc38ba3846f0da2e8267c2c164c0ada
Author: Jihoon Son <[email protected]>
Date:   2015-04-20T05:59:55Z

    TAJO-1553

commit 81c1318ae4e6f0d440c1d57403aa5f0730a9c434
Author: Jihoon Son <[email protected]>
Date:   2015-04-20T07:19:27Z

    TAJO-1553

commit 78c7222c523c76106a76c50565509ab056d2867e
Author: Jihoon Son <[email protected]>
Date:   2015-04-27T04:40:49Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into 
TAJO-1553

commit 1d97567a382308c5534284a8a796257818c0d79f
Author: Jihoon Son <[email protected]>
Date:   2015-04-27T09:30:43Z

    Merge branch 'TAJO-1553' of https://github.com/jihoonson/tajo-2 into 
TAJO-1553

commit 5f43b4e0b4a92d79ce0911d0309f9710d75815dd
Author: Jihoon Son <[email protected]>
Date:   2015-04-27T10:51:17Z

    TAJO-1553

commit 70905ddc28473f8049aa845de1fdd275d223cce4
Author: Jihoon Son <[email protected]>
Date:   2015-04-27T13:51:08Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into 
TAJO-1553

commit ae4bc8ec4c05f150760e3dbfc975a2f4a2dec9c8
Author: Jihoon Son <[email protected]>
Date:   2015-04-28T08:58:25Z

    TAJO-1553

commit b4d3d2f64475197a85dd285a488aa37572e4b293
Author: Jihoon Son <[email protected]>
Date:   2015-04-28T10:18:20Z

    TAJO-1553

commit 722c62e566c4127948c507b9c5ba82b5f2e15fac
Author: Jihoon Son <[email protected]>
Date:   2015-04-29T02:22:44Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into 
TAJO-1553
    
    Conflicts:
        
tajo-plan/src/main/java/org/apache/tajo/plan/expr/AggregationFunctionCallEval.java

commit 068b0ee150f1284600980bf6702760b00b45f252
Author: Jihoon Son <[email protected]>
Date:   2015-04-29T02:46:09Z

    TAJO-1553

commit 7938077be9d955faee0f6b3ccfdc64777d9b37f1
Author: Jihoon Son <[email protected]>
Date:   2015-04-29T10:06:53Z

    Fix aggregation problem

commit 21df329b03bab523ba448acf18515c9d4a5e591a
Author: Jihoon Son <[email protected]>
Date:   2015-04-30T10:43:19Z

    TAJO-1553

commit ebf12b5c2a78c75c068edd64eee6ffa093889a15
Author: Jihoon Son <[email protected]>
Date:   2015-05-04T03:42:18Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into 
TAJO-1553

commit 06d725c222f5ad210c2a822b009bf22fcd434dd3
Author: Jihoon Son <[email protected]>
Date:   2015-05-04T13:23:41Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into 
TAJO-1553

commit 0af98050a7e2df70af92b6cf91e805cb1b7ce663
Author: Jihoon Son <[email protected]>
Date:   2015-05-04T13:23:51Z

    TAJO-1553

commit bb72132627824c5e9a54de537f6808d94bf5635e
Author: Jihoon Son <[email protected]>
Date:   2015-05-04T13:25:46Z

    Merge branch 'TAJO-1553' of https://github.com/jihoonson/tajo-2 into 
TAJO-1553

commit b22fe962b7ed82ba0606f8a7236d68906406cd8b
Author: Jihoon Son <[email protected]>
Date:   2015-05-05T04:07:30Z

    TAJO-1553

commit 4cb67c8ff15256ad0b482dbd7ab8d30858006228
Author: Jihoon Son <[email protected]>
Date:   2015-05-05T13:15:20Z

    Fix distinct aggregation bug

commit edf427abffd14e30b1ae0d87496258df30d9f798
Author: Jihoon Son <[email protected]>
Date:   2015-05-06T01:58:01Z

    Merge branch 'TAJO-1553' of https://github.com/jihoonson/tajo-2 into 
TAJO-1553

commit 7598a737c0da5111200290992c8660b753fd27cc
Author: Jihoon Son <[email protected]>
Date:   2015-05-06T09:09:42Z

    TAJO-1553

commit 1498534b48c88d8960583d9d185e76a631b882b4
Author: Jihoon Son <[email protected]>
Date:   2015-05-06T09:50:21Z

    TAJO-1553

commit 8879f8c20ea9b8cfe1b0b1b187632740a7de4285
Author: Jihoon Son <[email protected]>
Date:   2015-05-07T14:04:12Z

    TAJO-1553

commit f557ac4d8111f1fda2937b72bf2ebedaf9537d0b
Author: Jihoon Son <[email protected]>
Date:   2015-05-08T03:56:11Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into 
TAJO-1553

commit 5fce06404c2bb0e90303c4833f6cff755e76e6eb
Author: Jihoon Son <[email protected]>
Date:   2015-05-08T03:57:25Z

    Merge branch 'TAJO-1553' of https://github.com/jihoonson/tajo-2 into 
TAJO-1553

commit b2ff12bee38524a0dd47171719b4a9855d09a895
Author: Jihoon Son <[email protected]>
Date:   2015-05-08T03:57:32Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into 
TAJO-1553

commit a6d322ca1307e83e33742fb8600b70b952f50cf7
Author: Jihoon Son <[email protected]>
Date:   2015-05-08T06:03:13Z

    TAJO-1553

----


> Improve broadcast join planning
> -------------------------------
>
>                 Key: TAJO-1553
>                 URL: https://issues.apache.org/jira/browse/TAJO-1553
>             Project: Tajo
>          Issue Type: Improvement
>          Components: distributed query plan, planner/optimizer
>            Reporter: Jihoon Son
>            Assignee: Jihoon Son
>             Fix For: 0.11.0
>
>
> The global engine generates a logical plan, and then marks some parts of the 
> plan as broadcast plan which means that they and their input will be 
> broadcasted to all workers. 
> Currently, broadcast parts are identified according to some rigid and 
> hard-coded rules. This will limit the broadcast opportunities in many cases.
> So, in this issue, I propose refactoring the broadcast planner to be more 
> general.
> Broadcast parts can be identified recursively.
> * A leaf node will be broadcasted if its input size does not exceed the 
> pre-defined threshold.
> * An intermediate node will be broadcasted if it has at least one broadcast 
> child.
> * For outer joins, row-preserved tables must not be broadcasted to avoid 
> input data duplication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to