villebro commented on issue #29839:
URL: https://github.com/apache/superset/issues/29839#issuecomment-2271725637

   Thanks for the comments @zhaoyongjie . I think you raised a lot of good 
points, really appreciate the feedback 👍
   
   > @villebro Thanks for document the SIP. I want to post a bit thought in 
here.
   > 
   > I don't think we should replace Celery as task queue. The reasons are:
   > 
   > Celery is a task queue but the Dask is a parallel computing framework. I'm 
not familiar with Dask, just have a look its document, it looks like Spark but 
in Python implementation so it should be different use case for a task queue.
   
   After digesting this a bit more last night, I'm also starting to lean back 
in the direction of Celery for now. While there may be use cases for a full 
DAG-style computing framework (e.g. caching chart data to be cached before 
triggering the dashboard or something to that nature), I think the focus of the 
Dask project is slightly misaligned with what we're looking for. Regarding the 
other alternatives (Redis Queue, Dramatiq, Huey, APScheduler et al), I don't 
think any of them offer a clear improvement over Celery. So I agree, I'm kind 
of feeling like remaining on Celery for now, and postponing the architectural 
overhaul of Celery.
   
   > > Lack of active maintenance of Celery
   > 
   > Celery is activated, the latest version released at Apr 17, 2024.
   
   I was going to say Celery has a long standing bug where the workers silently 
stop consuming tasks since many years, and this has served as an indicator of 
stagnation to me, as it has caused significant issues for us and required ugly 
workarounds: https://github.com/celery/celery/discussions/7276. However, it 
seems this bug has finally been fixed a few weeks ago as of Celery 5.5.0b1 (!). 
See Redis broker stability notes here: 
https://github.com/celery/celery/releases/tag/v5.5.0b1. So maybe things aren't 
as gloomy as I had thought, and things are slowly moving forward.
   
   > > Lack of advanced features that are available in Dask
   > 
   > Which advanced features should we want to use?
   
   I think one of the main features Celery lacks is proper task cancellation, 
which is critical to us (Dask has good mechanisms for this). Also, Dask has 
some nice autoscaling features which makes it possible to handle bursty 
operations better than Celery. Finally, Dask is newer, so it incorporates many 
of the mechanisms that were not so relevant during Celery's inception. Granted, 
many of them may not be critical for us right now (e.g. graph/DAG support), but 
could have use cases when they become available.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to