villebro commented on PR #25239: URL: https://github.com/apache/superset/pull/25239#issuecomment-1721904578
> As I was making the above change, I realized that if the celery queue was backed up, or there was any kind of delay between when celery beat placed the job into the queue and when it was executed, alerts & reports could be missed. @jfrag1 I actually ran into this exact issue, and after investigating it further, I found out that unfortunately Celery only supports passing static variables from the scheduler to the worker 🙁 So I wasn't able to come up with a clean way of solving this. However, I think one alternative solution could be as follows: 1. the scheduler only triggers reports to be started at some interval. Let's say once per minute 2. this task is picked up by a worker, and it would acquire a distributed lock. 3. Instead of using the current time, it would check from the key value store when alerts were last executed, and then execute any reports that are in the interval between last execution time and now. 4. After this the last run would be updated, and the lock would be returned. If another worker tries to start reports at the same time, they would silently go away, as the lock would already be taken. This would ensure the following: 1. No reports would be missed - if the queue would be clogged up, at some point a report scheduling task would get through, and then it would be able to clear the backlog of reports 2. duplicates would not happen, as the distributed lock would ensure that only one worker does report scheduling at a time Thoughts @jfrag1 @zephyring ? I have code that I use internally for distributed locks on the Superset `key_value` store (we use it in another context, but it's been rock solid for the last 1 year or so), so I can collab on this if needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
