villebro commented on issue #29839: URL: https://github.com/apache/superset/issues/29839#issuecomment-2271725637
Thanks for the comments @zhaoyongjie . I think you raised a lot of good points, really appreciate the feedback 👍 > @villebro Thanks for document the SIP. I want to post a bit thought in here. > > I don't think we should replace Celery as task queue. The reasons are: > > Celery is a task queue but the Dask is a parallel computing framework. I'm not familiar with Dask, just have a look its document, it looks like Spark but in Python implementation so it should be different use case for a task queue. After digesting this a bit more last night, I'm also starting to lean back in the direction of Celery for now. While there may be use cases for a full DAG-style computing framework (e.g. caching chart data to be cached before triggering the dashboard or something to that nature), I think the focus of the Dask project is slightly misaligned with what we're looking for. Regarding the other alternatives (Redis Queue, Dramatiq, Huey, APScheduler et al), I don't think any of them offer a clear improvement over Celery. So I agree, I'm kind of feeling like remaining on Celery for now, and postponing the architectural overhaul of Celery. > > Lack of active maintenance of Celery > > Celery is activated, the latest version released at Apr 17, 2024. I was going to say Celery has a long standing bug where the workers silently stop consuming tasks since many years, and this has served as an indicator of stagnation to me, as it has caused significant issues for us and required ugly workarounds: https://github.com/celery/celery/discussions/7276. However, it seems this bug has finally been fixed a few weeks ago as of Celery 5.5.0b1 (!). See Redis broker stability notes here: https://github.com/celery/celery/releases/tag/v5.5.0b1. So maybe things aren't as gloomy as I had thought, and things are slowly moving forward. > > Lack of advanced features that are available in Dask > > Which advanced features should we want to use? I think one of the main features Celery lacks is proper task cancellation, which is critical to us (Dask has good mechanisms for this). Also, Dask has some nice autoscaling features which makes it possible to handle bursty operations better than Celery. Finally, Dask is newer, so it incorporates many of the mechanisms that were not so relevant during Celery's inception. Granted, many of them may not be critical for us right now (e.g. graph/DAG support), but could have use cases when they become available. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
