[
https://issues.apache.org/jira/browse/BEAM-4658?focusedWorklogId=127903&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-127903
]
ASF GitHub Bot logged work on BEAM-4658:
----------------------------------------
Author: ASF GitHub Bot
Created on: 26/Jul/18 19:48
Start Date: 26/Jul/18 19:48
Worklog Time Spent: 10m
Work Description: tweise commented on a change in pull request #6050:
[BEAM-4658] Update pipeline representation in runner support libraries to
handle timers.
URL: https://github.com/apache/beam/pull/6050#discussion_r205581055
##########
File path:
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java
##########
@@ -150,41 +154,51 @@ private static boolean canFuseParDo(
// is never possible.
return false;
}
- if (!pipeline.getSideInputs(parDo).isEmpty()) {
- // At execution time, a Runner is required to only provide inputs to a
PTransform that, at
- // the time the PTransform processes them, the associated window is
ready in all side inputs
- // that the PTransform consumes. For an arbitrary stage, it is
significantly complex for the
- // runner to determine this for each input. As a result, we break fusion
to simplify this
- // inspection. In general, a ParDo which consumes side inputs cannot be
fused into an
- // executable stage alongside any transforms which are upstream of any
of its side inputs.
- return false;
- } else {
- try {
- ParDoPayload payload =
ParDoPayload.parseFrom(parDo.getTransform().getSpec().getPayload());
- if (payload.getStateSpecsCount() > 0 || payload.getTimerSpecsCount() >
0) {
- // Inputs to a ParDo that uses State or Timers must be
key-partitioned, and elements for
- // a key must execute serially. To avoid checking if the rest of the
stage is
- // key-partitioned and preserves keys, these ParDos do not fuse into
an existing stage.
- return false;
- }
- } catch (InvalidProtocolBufferException e) {
- throw new IllegalArgumentException(e);
+ try {
+ ParDoPayload payload =
ParDoPayload.parseFrom(parDo.getTransform().getSpec().getPayload());
+ if (Maps.filterKeys(
+ parDo.getTransform().getInputsMap(), s ->
payload.getTimerSpecsMap().containsKey(s))
+ .values()
+ .contains(candidate.getId())) {
+ // Allow fusion across timer PCollections because they are a self loop.
+ return true;
+ } else if (payload.getStateSpecsCount() > 0 ||
payload.getTimerSpecsCount() > 0) {
+ // Inputs to a ParDo that uses State or Timers must be
key-partitioned, and elements for
+ // a key must execute serially. To avoid checking if the rest of the
stage is
+ // key-partitioned and preserves keys, these ParDos do not fuse into
an existing stage.
+ return false;
+ } else if (!pipeline.getSideInputs(parDo).isEmpty()) {
+ // At execution time, a Runner is required to only provide inputs to a
PTransform that, at
+ // the time the PTransform processes them, the associated window is
ready in all side inputs
+ // that the PTransform consumes. For an arbitrary stage, it is
significantly complex for the
+ // runner to determine this for each input. As a result, we break
fusion to simplify this
+ // inspection. In general, a ParDo which consumes side inputs cannot
be fused into an
+ // executable stage alongside any transforms which are upstream of any
of its side inputs.
+ return false;
}
+ } catch (InvalidProtocolBufferException e) {
+ throw new IllegalArgumentException(e);
}
return true;
}
private static boolean parDoCompatibility(
Review comment:
nit: the name of this method sounds odd, should it be canFuse or something?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 127903)
Time Spent: 3h 10m (was: 3h)
> Update pipeline representation in runner support libraries to handle timers
> ---------------------------------------------------------------------------
>
> Key: BEAM-4658
> URL: https://issues.apache.org/jira/browse/BEAM-4658
> Project: Beam
> Issue Type: Sub-task
> Components: runner-core
> Reporter: Luke Cwik
> Assignee: Luke Cwik
> Priority: Major
> Time Spent: 3h 10m
> Remaining Estimate: 0h
>
> Expose relevant timer information on *ExecutableStage* to runners
> Update fusion logic to handle timers.
> Provide execution time interface to fire timers into *RemoteBundle*s and also
> to receive new timers that are being set.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)