[ 
https://issues.apache.org/jira/browse/BEAM-7520?focusedWorklogId=326881&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-326881
 ]

ASF GitHub Bot logged work on BEAM-7520:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 11/Oct/19 12:51
            Start Date: 11/Oct/19 12:51
    Worklog Time Spent: 10m 
      Work Description: je-ik commented on pull request #9190: [BEAM-7520] Fix 
timer firing order in DirectRunner
URL: https://github.com/apache/beam/pull/9190#discussion_r333974735
 
 

 ##########
 File path: 
runners/direct-java/src/main/java/org/apache/beam/runners/direct/WatermarkManager.java
 ##########
 @@ -829,6 +864,14 @@ public Instant get() {
   @GuardedBy("refreshLock")
   private final Set<ExecutableT> pendingRefreshes;
 
+  /**
+   * A set of executables with currently extracted timers, that are to be 
processed. Note that, due
 
 Review comment:
   That is correct. The problem is that `WatermarkManager` works on level of 
`PTransform`s. I can push the logic down to `TransformWatermarks`, which is the 
first level, that is somewhat aware of keys (in `extractFiredTimers`), but that 
would mean, that timers extracted from `AppliedPTransformInputWatermark` would 
have to be setup again. That is because the logic requires that extracted timer 
to be fired, or setup again (in the way that `StatefulParDoEvaluator` pushes 
them back). I think that this makes the logic far too much complicated given 
that what we gain is the ability to extract and execute timers for various keys 
of the same PTransform in parallel. I must emphasize, that this is related only 
to DirectRunner, which is already pretty slow, because it does many validations 
and its main purpose is testing. And moreover, I'm not convinced that 
implementing this optimization in reality would bring and significant 
performance boost (speculative claim, I didn't do any performance testing). 
Maybe we can merge it as it is and solve performance issues after if they 
appear (avoid premature optimization with significant impact on readability and 
understandability of already somewhat complicated code)?
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 326881)
    Time Spent: 11h 50m  (was: 11h 40m)

> DirectRunner timers are not strictly time ordered
> -------------------------------------------------
>
>                 Key: BEAM-7520
>                 URL: https://issues.apache.org/jira/browse/BEAM-7520
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-direct
>    Affects Versions: 2.13.0
>            Reporter: Jan Lukavský
>            Assignee: Jan Lukavský
>            Priority: Major
>          Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> Let's suppose we have the following situation:
>  - statful ParDo with two timers - timerA and timerB
>  - timerA is set for window.maxTimestamp() + 1
>  - timerB is set anywhere between <windowStart, windowEnd), let's denote that 
> timerB.timestamp
>  - input watermark moves to BoundedWindow.TIMESTAMP_MAX_VALUE
> Then the order of timers is as follows (correct):
>  - timerB
>  - timerA
> But, if timerB sets another timer (say for timerB.timestamp + 1), then the 
> order of timers will be:
>  - timerB (timerB.timestamp)
>  - timerA (BoundedWindow.TIMESTAMP_MAX_VALUE)
>  - timerB (timerB.timestamp + 1)
> Which is not ordered by timestamp. The reason for this is that when the input 
> watermark update is evaluated, the WatermarkManager,extractFiredTimers() will 
> produce both timerA and timerB. That would be correct, but when timerB sets 
> another timer, that breaks this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to