guhuajun commented on issue #34219: URL: https://github.com/apache/airflow/issues/34219#issuecomment-2119648252
Greetings, I am evaluating the features in Airflow project. I am still new to the Airflow project. But It's good to see the old friend Celery. It is the project I am familiar with since 2015. After reading your words, maybe you can start with simple one, [Play Celery with django](https://docs.celeryq.dev/en/stable/django/index.html). It's easily start two workers with docker-compose and having some fun moments. Since we already have [DagRun](https://airflow.apache.org/docs/apache-airflow/stable/_api/airflow/models/dagrun/index.html#module-airflow.models.dagrun), once the dag files are synced/replicated to all workers, we should have a success. Good luck! `airflow pools join` will be a major milestone for this project. :) > I've had a look at how the helm chart is structured and my approach would be as follows: > > First, identify the minimum options that need to be configurable for the feature to be useful. The ones I'm seeing are: > > * worker image > > * command & args > > * replicas > > * resources > > * autoscaling (keda) > > > Since setting the image specifically for workers is not yet configurable - that would be a first standalone PR. > > Next, I would leave the current `workers:` key exactly as it is (both for backward compatibility and for the fact that there will always be at least one worker type) and introduce an `additional-celery-workers:` key, where the aforementioned options can be specified and the rest will be the same as for the default worker type. > > I believe most worker-related components can be exactly the same for all workers (at least for now), so they should not require changes: > > * service account > > * service > > * network policy > > * DB connection setup for keda (e.g. pgbouncer network policy) > > > What does need to change is Keda - we will have to create additional ScaledObjects to reflect the addition of Deployments/StatefulSets. What I'm a bit unsure of is the [query](https://github.com/apache/airflow/blob/main/chart/values.yaml#L585). Currently, it just lists everything from a table named `task_instance` - does this table contain queue-related info/can it be easily added? > > Let me know if the overall approach sounds reasonable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
