[
https://issues.apache.org/jira/browse/BEAM-6350?focusedWorklogId=182344&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182344
]
ASF GitHub Bot logged work on BEAM-6350:
----------------------------------------
Author: ASF GitHub Bot
Created on: 08/Jan/19 11:41
Start Date: 08/Jan/19 11:41
Worklog Time Spent: 10m
Work Description: dmvk commented on pull request #7399: [BEAM-6350] Reuse
PCollectionView when created in translators
URL: https://github.com/apache/beam/pull/7399#discussion_r245965771
##########
File path:
sdks/java/extensions/euphoria/src/main/java/org/apache/beam/sdk/extensions/euphoria/core/translate/BroadcastHashJoinTranslator.java
##########
@@ -42,15 +41,24 @@
@Override
PCollection<KV<KeyT, OutputT>> translate(
Join<LeftT, RightT, KeyT, OutputT> operator,
- PCollection<KV<KeyT, LeftT>> left,
- PCollection<KV<KeyT, RightT>> right) {
+ PCollection<LeftT> left,
+ PCollection<KV<KeyT, LeftT>> leftKeyed,
+ PCollection<RightT> right,
+ PCollection<KV<KeyT, RightT>> rightKeyed) {
+
final AccumulatorProvider accumulators =
new
LazyAccumulatorProvider(AccumulatorProvider.of(left.getPipeline()));
+
+ // We use PViewsStore to prevent multiple views to the same input
PCollection.
+ // And therefore multiple broadcasts of the same data.
+ final PViewsStore pViews =
+
left.getPipeline().getOptions().as(EuphoriaOptions.class).getPCollectionViewsStore();
+
switch (operator.getType()) {
case LEFT:
final PCollectionView<Map<KeyT, Iterable<RightT>>> broadcastRight =
- right.apply(View.asMultimap());
- return left.apply(
+ pViews.computeViewAsMultimapIfAbsent(right, rightKeyed);
Review comment:
I don't think comparing by pcollection is enough as we can have a different
key extractor.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 182344)
> Reuse same PCollectionView when created in translators
> -------------------------------------------------------
>
> Key: BEAM-6350
> URL: https://issues.apache.org/jira/browse/BEAM-6350
> Project: Beam
> Issue Type: Improvement
> Components: dsl-euphoria
> Affects Versions: 2.11.0
> Reporter: Marek Simunek
> Assignee: David Moravek
> Priority: Major
> Time Spent: 40m
> Remaining Estimate: 0h
>
> If for LeftJoin is used BroadcastHashJoinTranslator then from right side is
> created PCollectionView (as sideInput).
> If we use right side in multiple joins then PCollectionView is created
> multiple times which is not optimal behavior.
> Eg.
> {code:java}
> LeftJoin.of(left, right)..
> LeftJoin.of(anotherLeftPcollection, right)..
> {code}
> For example it will happen when we want to solve skew Join.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)