[
https://issues.apache.org/jira/browse/BEAM-10400?focusedWorklogId=458156&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-458156
]
ASF GitHub Bot logged work on BEAM-10400:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 13/Jul/20 15:36
Start Date: 13/Jul/20 15:36
Worklog Time Spent: 10m
Work Description: je-ik commented on a change in pull request #12155:
URL: https://github.com/apache/beam/pull/12155#discussion_r453739657
##########
File path:
runners/direct-java/src/main/java/org/apache/beam/runners/direct/QuiescenceDriver.java
##########
@@ -70,6 +71,8 @@ public static ExecutionDriver create(
private final Map<AppliedPTransform<?, ?, ?>,
ConcurrentLinkedQueue<CommittedBundle<?>>>
pendingRootBundles;
private final Queue<WorkUpdate> pendingWork = new ConcurrentLinkedQueue<>();
+ private final Map<AppliedPTransform<?, ?, ?>,
Collection<CommittedBundle<?>>> inflightBundles =
+ new ConcurrentHashMap<>();
Review comment:
Moved the comment to code. One more note - the issue arrises, because of
how output watermark of PTransform is directly connected to input watermark of
downstream PTransform. Maybe a more "technically correct" solution would be to
attach output watermark updates to the bundle processing, so that bundle
life-cycle would become:
- start bundle
- finish bundle (and enqueue and resulting bundles to pendingUpdates in
downstream PTransform)
- update output watermark
But that would most probably require a more complex refactor.
##########
File path:
runners/direct-java/src/main/java/org/apache/beam/runners/direct/QuiescenceDriver.java
##########
@@ -70,6 +71,8 @@ public static ExecutionDriver create(
private final Map<AppliedPTransform<?, ?, ?>,
ConcurrentLinkedQueue<CommittedBundle<?>>>
pendingRootBundles;
private final Queue<WorkUpdate> pendingWork = new ConcurrentLinkedQueue<>();
+ private final Map<AppliedPTransform<?, ?, ?>,
Collection<CommittedBundle<?>>> inflightBundles =
+ new ConcurrentHashMap<>();
Review comment:
Moved the comment to code. One more note - the issue arrises, because of
how output watermark of PTransform is directly connected to input watermark of
downstream PTransform. Maybe a more "technically correct" solution would be to
attach output watermark updates to the bundle processing, so that bundle
life-cycle would become:
- start bundle
- finish bundle (and enqueue and resulting bundles to pendingUpdates in
downstream PTransform)
- update output watermark
But that would most probably require a more complex refactor.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 458156)
Time Spent: 2h 40m (was: 2.5h)
> DirectRunner: race condition in watermark update
> ------------------------------------------------
>
> Key: BEAM-10400
> URL: https://issues.apache.org/jira/browse/BEAM-10400
> Project: Beam
> Issue Type: Bug
> Components: runner-direct
> Affects Versions: 2.23.0
> Reporter: Jan Lukavský
> Assignee: Jan Lukavský
> Priority: P2
> Time Spent: 2h 40m
> Remaining Estimate: 0h
>
> When watermark is updated in upstream PTransform, it is *instantly*
> propagated as input watermark to all directly connected downstream
> PTransforms. We must ensure that all output bundles that were emitted before
> the watermark update are processed before we allow updating input watermark
> of downstream PTransform.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)