[
https://issues.apache.org/jira/browse/TAJO-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560194#comment-14560194
]
ASF GitHub Bot commented on TAJO-1553:
--------------------------------------
Github user hyunsik commented on a diff in the pull request:
https://github.com/apache/tajo/pull/583#discussion_r31093213
--- Diff:
tajo-core/src/main/java/org/apache/tajo/engine/planner/global/ExecutionBlock.java
---
@@ -39,8 +40,38 @@
private boolean hasJoinPlan;
private boolean hasUnionPlan;
-
- private Set<String> broadcasted = new HashSet<String>();
+ private boolean isUnionOnly;
+
+ private Map<String, ScanNode> broadcastRelations = TUtil.newHashMap();
+
+ /*
+ * An execution block is null-supplying or preserved-row when its output
is used as an input for outer join.
+ * These flags are set according to the type of outer join.
+ * Here are brief descriptions for these flags.
+ *
+ * 1) left outer join
+ *
+ * left outer join
+ * / \
+ * preserved-row null-supplying
+ *
+ * 2) right outer join
+ *
+ * right outer join
+ * / \
+ * null-supplying preserved-row
+ *
+ * 3) full outer join
+ *
+ * full outer join
+ * / \
+ * null-supplying preserved-row
+ * preserved-row null-supplying
+ *
+ * The null-supplying and preserved-row flags are used to find which
relations will be broadcasted.
+ */
+ protected transient boolean nullSuppllying = false;
--- End diff --
'transient' seems to be not necessary.
> Improve broadcast join planning
> -------------------------------
>
> Key: TAJO-1553
> URL: https://issues.apache.org/jira/browse/TAJO-1553
> Project: Tajo
> Issue Type: Improvement
> Components: distributed query plan, planner/optimizer
> Reporter: Jihoon Son
> Assignee: Jihoon Son
> Fix For: 0.11.0
>
>
> The global engine generates a logical plan, and then marks some parts of the
> plan as broadcast plan which means that they and their input will be
> broadcasted to all workers.
> Currently, broadcast parts are identified according to some rigid and
> hard-coded rules. This will limit the broadcast opportunities in many cases.
> So, in this issue, I propose refactoring the broadcast planner to be more
> general.
> Broadcast parts can be identified recursively.
> * A leaf node will be broadcasted if its input size does not exceed the
> pre-defined threshold.
> * An intermediate node will be broadcasted if it has at least one broadcast
> child.
> * For outer joins, row-preserved tables must not be broadcasted to avoid
> input data duplication.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)