villebro commented on PR #25239:
URL: https://github.com/apache/superset/pull/25239#issuecomment-1721904578

   > As I was making the above change, I realized that if the celery queue was 
backed up, or there was any kind of delay between when celery beat placed the 
job into the queue and when it was executed, alerts & reports could be missed.
   
   @jfrag1 I actually ran into this exact issue, and after investigating it 
further, I found out that unfortunately Celery only supports passing static 
variables from the scheduler to the worker 🙁  So I wasn't able to come up with 
a clean way of solving this. However, I think one alternative solution could be 
as follows:
   
   1. the scheduler only triggers reports to be started at some interval. Let's 
say once per minute
   2. this task is picked up by a worker, and it would acquire a distributed 
lock.
   3. Instead of using the current time, it would check from the key value 
store when alerts were last executed, and then execute any reports that are in 
the interval between last execution time and now.
   4. After this the last run would be updated, and the lock would be returned.
   
   If another worker tries to start reports at the same time, they would 
silently go away, as the lock would already be taken. This would ensure the 
following:
   
   1. No reports would be missed - if the queue would be clogged up, at some 
point a report scheduling task would get through, and then it would be able to 
clear the backlog of reports
   2. duplicates would not happen, as the distributed lock would ensure that 
only one worker does report scheduling at a time
   
   Thoughts @jfrag1 @zephyring ? I have code that I use internally for 
distributed locks on the Superset `key_value` store (we use it in another 
context, but it's been rock solid for the last 1 year or so), so I can collab 
on this if needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to