[ 
https://issues.apache.org/jira/browse/TAJO-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560219#comment-14560219
 ] 

ASF GitHub Bot commented on TAJO-1553:
--------------------------------------

Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/583#discussion_r31094037
  
    --- Diff: 
tajo-core/src/main/java/org/apache/tajo/engine/planner/global/ExecutionBlock.java
 ---
    @@ -39,8 +40,38 @@
     
       private boolean hasJoinPlan;
       private boolean hasUnionPlan;
    -
    -  private Set<String> broadcasted = new HashSet<String>();
    +  private boolean isUnionOnly;
    +
    +  private Map<String, ScanNode> broadcastRelations = TUtil.newHashMap();
    +
    +  /*
    +   * An execution block is null-supplying or preserved-row when its output 
is used as an input for outer join.
    +   * These flags are set according to the type of outer join.
    +   * Here are brief descriptions for these flags.
    +   *
    +   * 1) left outer join
    +   *
    +   *        left outer join
    +   *          /        \
    +   * preserved-row  null-supplying
    +   *
    +   * 2) right outer join
    +   *
    +   *        right outer join
    +   *          /        \
    +   * null-supplying  preserved-row
    +   *
    +   * 3) full outer join
    +   *
    +   *        full outer join
    +   *          /        \
    +   * null-supplying  preserved-row
    +   * preserved-row   null-supplying
    +   *
    +   * The null-supplying and preserved-row flags are used to find which 
relations will be broadcasted.
    +   */
    +  protected transient boolean nullSuppllying = false;
    --- End diff --
    
    Thanks. I'll remove. 


> Improve broadcast join planning
> -------------------------------
>
>                 Key: TAJO-1553
>                 URL: https://issues.apache.org/jira/browse/TAJO-1553
>             Project: Tajo
>          Issue Type: Improvement
>          Components: distributed query plan, planner/optimizer
>            Reporter: Jihoon Son
>            Assignee: Jihoon Son
>             Fix For: 0.11.0
>
>
> The global engine generates a logical plan, and then marks some parts of the 
> plan as broadcast plan which means that they and their input will be 
> broadcasted to all workers. 
> Currently, broadcast parts are identified according to some rigid and 
> hard-coded rules. This will limit the broadcast opportunities in many cases.
> So, in this issue, I propose refactoring the broadcast planner to be more 
> general.
> Broadcast parts can be identified recursively.
> * A leaf node will be broadcasted if its input size does not exceed the 
> pre-defined threshold.
> * An intermediate node will be broadcasted if it has at least one broadcast 
> child.
> * For outer joins, row-preserved tables must not be broadcasted to avoid 
> input data duplication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to