[ 
https://issues.apache.org/jira/browse/BEAM-6350?focusedWorklogId=182344&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182344
 ]

ASF GitHub Bot logged work on BEAM-6350:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/Jan/19 11:41
            Start Date: 08/Jan/19 11:41
    Worklog Time Spent: 10m 
      Work Description: dmvk commented on pull request #7399: [BEAM-6350] Reuse 
PCollectionView when created in translators
URL: https://github.com/apache/beam/pull/7399#discussion_r245965771
 
 

 ##########
 File path: 
sdks/java/extensions/euphoria/src/main/java/org/apache/beam/sdk/extensions/euphoria/core/translate/BroadcastHashJoinTranslator.java
 ##########
 @@ -42,15 +41,24 @@
   @Override
   PCollection<KV<KeyT, OutputT>> translate(
       Join<LeftT, RightT, KeyT, OutputT> operator,
-      PCollection<KV<KeyT, LeftT>> left,
-      PCollection<KV<KeyT, RightT>> right) {
+      PCollection<LeftT> left,
+      PCollection<KV<KeyT, LeftT>> leftKeyed,
+      PCollection<RightT> right,
+      PCollection<KV<KeyT, RightT>> rightKeyed) {
+
     final AccumulatorProvider accumulators =
         new 
LazyAccumulatorProvider(AccumulatorProvider.of(left.getPipeline()));
+
+    // We use PViewsStore to prevent multiple views to the same input 
PCollection.
+    // And therefore multiple broadcasts of the same data.
+    final PViewsStore pViews =
+        
left.getPipeline().getOptions().as(EuphoriaOptions.class).getPCollectionViewsStore();
+
     switch (operator.getType()) {
       case LEFT:
         final PCollectionView<Map<KeyT, Iterable<RightT>>> broadcastRight =
-            right.apply(View.asMultimap());
-        return left.apply(
+            pViews.computeViewAsMultimapIfAbsent(right, rightKeyed);
 
 Review comment:
   I don't think comparing by pcollection is enough as we can have a different 
key extractor.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 182344)

> Reuse same PCollectionView when created in translators 
> -------------------------------------------------------
>
>                 Key: BEAM-6350
>                 URL: https://issues.apache.org/jira/browse/BEAM-6350
>             Project: Beam
>          Issue Type: Improvement
>          Components: dsl-euphoria
>    Affects Versions: 2.11.0
>            Reporter: Marek Simunek
>            Assignee: David Moravek
>            Priority: Major
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> If for LeftJoin is used BroadcastHashJoinTranslator then from right side is 
> created PCollectionView (as sideInput).
> If we use right side in multiple joins then PCollectionView  is created 
> multiple times which is not optimal behavior.
> Eg.
> {code:java}
> LeftJoin.of(left, right)..
> LeftJoin.of(anotherLeftPcollection, right)..
> {code}
>  For example it will happen when we want to solve skew Join.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to