Github user srdo commented on a diff in the pull request:
https://github.com/apache/storm/pull/2090#discussion_r117821993
--- Diff:
storm-client/src/jvm/org/apache/storm/windowing/TimeTriggerPolicy.java ---
@@ -62,7 +62,9 @@ public void reset() {
@Override
public void start() {
- executorFuture = executor.scheduleAtFixedRate(newTriggerTask(),
duration, duration, TimeUnit.MILLISECONDS);
+ // initial delay is slightly less than the duration so that the
initial tuples wont't expire due to time drift
+ long initialDelay = duration - Math.min((long) (duration * .1),
10);
+ executorFuture = executor.scheduleAtFixedRate(newTriggerTask(),
initialDelay, duration, TimeUnit.MILLISECONDS);
--- End diff --
Won't this still be vulnerable to losing initial tuples if the first
execution is delayed longer than expected for some reason?
I'm wondering if we could make this more reliable by tracking the time we
consider expired instead of/in addition to tracking what time it is now. Tuples
between the prevExpired timestamp and the nowExpired timestamp are tuples we
can't be sure have been processed, so they should be included in the window.
Since the expired timestamps also move ~duration every onTrigger, we still get
the tuples expired we expect. For the first onTrigger, we can just process all
the tuples because we can tell it's the first call since the prevExpired
timestamp isn't set yet.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---