[ 
https://issues.apache.org/jira/browse/BEAM-6350?focusedWorklogId=183027&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-183027
 ]

ASF GitHub Bot logged work on BEAM-6350:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 09/Jan/19 11:22
            Start Date: 09/Jan/19 11:22
    Worklog Time Spent: 10m 
      Work Description: mareksimunek commented on pull request #7399: 
[BEAM-6350] Reuse PCollectionView when created in translators
URL: https://github.com/apache/beam/pull/7399#discussion_r246311325
 
 

 ##########
 File path: 
sdks/java/extensions/euphoria/src/main/java/org/apache/beam/sdk/extensions/euphoria/core/translate/BroadcastHashJoinTranslator.java
 ##########
 @@ -35,22 +38,41 @@
  * Translator for {@link 
org.apache.beam.sdk.extensions.euphoria.core.client.operator.RightJoin} and
  * {@link 
org.apache.beam.sdk.extensions.euphoria.core.client.operator.LeftJoin} when one 
side of
  * the join fits in memory so it can be distributed in hash map with the other 
side.
+ *
+ * <p>Note that when reusing smaller join side to several broadcast hash joins 
there are some rules
+ * to follow to avoid data to be send to executors repeatedly:
+ *
+ * <ul>
+ *   <li>Input {@link PCollection} of broadcasted side has to be the same 
instance
+ *   <li>Key extractor of broadcasted side has to be the same {@link 
UnaryFunction} instance
+ * </ul>
  */
 public class BroadcastHashJoinTranslator<LeftT, RightT, KeyT, OutputT>
     extends AbstractJoinTranslator<LeftT, RightT, KeyT, OutputT> {
 
+  /**
+   * Used to prevent multiple views to the same input PCollection. And 
therefore multiple broadcasts
+   * of the same data.
+   */
+  private Table<PCollection<?>, UnaryFunction<?, KeyT>, PCollectionView<?>> 
pViews =
 
 Review comment:
   better to add `final` to this field
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 183027)
    Time Spent: 1h 10m  (was: 1h)

> Reuse same PCollectionView when created in translators 
> -------------------------------------------------------
>
>                 Key: BEAM-6350
>                 URL: https://issues.apache.org/jira/browse/BEAM-6350
>             Project: Beam
>          Issue Type: Improvement
>          Components: dsl-euphoria
>    Affects Versions: 2.11.0
>            Reporter: Marek Simunek
>            Assignee: David Moravek
>            Priority: Major
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> If for LeftJoin is used BroadcastHashJoinTranslator then from right side is 
> created PCollectionView (as sideInput).
> If we use right side in multiple joins then PCollectionView  is created 
> multiple times which is not optimal behavior.
> Eg.
> {code:java}
> LeftJoin.of(left, right)..
> LeftJoin.of(anotherLeftPcollection, right)..
> {code}
>  For example it will happen when we want to solve skew Join.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to