[
https://issues.apache.org/jira/browse/BEAM-6350?focusedWorklogId=183027&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-183027
]
ASF GitHub Bot logged work on BEAM-6350:
----------------------------------------
Author: ASF GitHub Bot
Created on: 09/Jan/19 11:22
Start Date: 09/Jan/19 11:22
Worklog Time Spent: 10m
Work Description: mareksimunek commented on pull request #7399:
[BEAM-6350] Reuse PCollectionView when created in translators
URL: https://github.com/apache/beam/pull/7399#discussion_r246311325
##########
File path:
sdks/java/extensions/euphoria/src/main/java/org/apache/beam/sdk/extensions/euphoria/core/translate/BroadcastHashJoinTranslator.java
##########
@@ -35,22 +38,41 @@
* Translator for {@link
org.apache.beam.sdk.extensions.euphoria.core.client.operator.RightJoin} and
* {@link
org.apache.beam.sdk.extensions.euphoria.core.client.operator.LeftJoin} when one
side of
* the join fits in memory so it can be distributed in hash map with the other
side.
+ *
+ * <p>Note that when reusing smaller join side to several broadcast hash joins
there are some rules
+ * to follow to avoid data to be send to executors repeatedly:
+ *
+ * <ul>
+ * <li>Input {@link PCollection} of broadcasted side has to be the same
instance
+ * <li>Key extractor of broadcasted side has to be the same {@link
UnaryFunction} instance
+ * </ul>
*/
public class BroadcastHashJoinTranslator<LeftT, RightT, KeyT, OutputT>
extends AbstractJoinTranslator<LeftT, RightT, KeyT, OutputT> {
+ /**
+ * Used to prevent multiple views to the same input PCollection. And
therefore multiple broadcasts
+ * of the same data.
+ */
+ private Table<PCollection<?>, UnaryFunction<?, KeyT>, PCollectionView<?>>
pViews =
Review comment:
better to add `final` to this field
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 183027)
Time Spent: 1h 10m (was: 1h)
> Reuse same PCollectionView when created in translators
> -------------------------------------------------------
>
> Key: BEAM-6350
> URL: https://issues.apache.org/jira/browse/BEAM-6350
> Project: Beam
> Issue Type: Improvement
> Components: dsl-euphoria
> Affects Versions: 2.11.0
> Reporter: Marek Simunek
> Assignee: David Moravek
> Priority: Major
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> If for LeftJoin is used BroadcastHashJoinTranslator then from right side is
> created PCollectionView (as sideInput).
> If we use right side in multiple joins then PCollectionView is created
> multiple times which is not optimal behavior.
> Eg.
> {code:java}
> LeftJoin.of(left, right)..
> LeftJoin.of(anotherLeftPcollection, right)..
> {code}
> For example it will happen when we want to solve skew Join.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)