dustin12 commented on code in PR #36596:
URL: https://github.com/apache/beam/pull/36596#discussion_r2475192328


##########
sdks/python/apache_beam/transforms/async_dofn.py:
##########
@@ -256,10 +256,12 @@ def schedule_item(self, element, ignore_buffer=False, 
*args, **kwargs):
         total_sleep += sleep_time
         sleep(sleep_time)
 
-  def next_time_to_fire(self):
+  def next_time_to_fire(self, key):
+    random.seed(key)
     return (
         floor((time() + self._timer_frequency) / self._timer_frequency) *
-        self._timer_frequency)
+        self._timer_frequency) + (
+            random.random() * self._timer_frequency)

Review Comment:
   I started just having keys setting a timer now + 10s.  That doesn't work 
because as new work arrives the timer firing time keeps getting pushed out. ie. 
an element arrives at t=1, we want to check back on it at t=11 so we set the 
timer, but then an element arrives at t=9 and overwrites the timer to t=19.
   
   Next setup was having this round increment firing time.  so any message that 
arrives between t=0 and t=10 sets the timer for 0:10.  That way the element at 
t=9 doesn't override the timer to t=19 but keeps it at t=10.
   
   That works but means we see a spike of timers at t=10, t=20, t=30 etc.  
There isn't any reason the timers all need to fire at these round increments so 
this is attempting to add fuzzing per key (since timers are per key).  Ideally 
this means that any 1 key has buckets 10s apart so the overwriting problem is 
fixed but also means that across multiple keys the buckets don't all fire at 
the same time.  I believe this is what the random.seed(key) on line 260 is 
doing but correct me if I'm wrong.
   
   Also, let me know if you know an easier way to obtain this pattern.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to