aamangeldi opened a new issue #8023: CeleryExecutor gevent/eventlet pools need monkey patching URL: https://github.com/apache/airflow/issues/8023 **Apache Airflow version**: 1.10.9 **Kubernetes version (if you are using kubernetes)** (use `kubectl version`): ``` Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.2", GitCommit:"c97fe5036ef3df2967d086711e6c0c405941e14b", GitTreeState:"clean", BuildDate:"2019-10-15T23:42:50Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.10-gke.17", GitCommit:"bdceba0734835c6cb1acbd1c447caf17d8613b44", GitTreeState:"clean", BuildDate:"2020-01-17T23:10:13Z", GoVersion:"go1.12.12b4", Compiler:"gc", Platform:"linux/amd64"} ``` Note: the issue is not specific to k8s. **Environment**: Any. I was able to reproduce using [CeleryExecutor docker-compose](https://github.com/puckel/docker-airflow/blob/1.10.9/docker-compose-CeleryExecutor.yml) in the puckel repo (code version tagged as 1.10.9). **What happened**: When setting the `pool` setting in the `[celery]` section [in airflow.cfg](https://github.com/apache/airflow/blob/master/airflow/config_templates/default_airflow.cfg#L598) to `eventlet` or `gevent`, task instances get scheduled, queued, picked up by the workers, **but not executed**. **What you expected to happen**: Task instances should be executed. The problem is that the application is not [monkey-patched](https://eventlet.net/doc/patching.html#monkey-patch). Celery by default handles monkey-patching but not in all scenarios (e.g. only if Celery is invoked via command line, [more info](https://github.com/celery/celery/search?q=maybe_patch_concurrency&unscoped_q=maybe_patch_concurrency)). Airflow invokes Celery workers in Python via [.run()](https://github.com/celery/celery/blob/v4.2.2/celery/bin/worker.py#L232). Unfortunately, this function does not handle monkey patching. **How to reproduce it**: 1. Clone [puckel's docker-airflow](https://github.com/puckel/docker-airflow/tree/1.10.9): ``` git clone [email protected]:puckel/docker-airflow.git ``` 2. Modify Dockerfile to: ``` RUN pip install eventlet ``` Then: ``` docker build --rm -t puckel/docker-airflow:1.10.9 . ``` 3. Set `pool = eventlet` in [airflow.cfg](https://github.com/puckel/docker-airflow/blob/1.10.9/config/airflow.cfg#L509) (the file will be mounted by docker-compose). 4. Spin up [the CeleryExecutor docker compose]([CeleryExecutor docker-compose](https://github.com/puckel/docker-airflow/blob/1.10.9/docker-compose-CeleryExecutor.yml): ``` docker-compose -f docker-compose-CeleryExecutor.yml up -d ``` 5. Navigate to http://localhost:8080, and run an example DAG. 6. Notice that no task ever gets to the running state. **Solution**: Ideally this should be fixed in [Celery]([email protected]:puckel/docker-airflow.git), but in the meantime it might be good to have a solution here as well. Here is a patch that I applied to solve this (on Airflow 1.10.9): ``` --- cli.py 2020-03-27 17:05:45.000000000 -0400 +++ cli-new.py 2020-03-27 17:19:48.000000000 -0400 @@ -1098,7 +1098,10 @@ } if conf.has_option("celery", "pool"): - options["pool"] = conf.get("celery", "pool") + pool = conf.get("celery", "pool") + options["pool"] = pool + from celery import maybe_patch_concurrency + maybe_patch_concurrency(['-P', pool]) if args.daemon: pid, stdout, stderr, log_file = setup_locations("worker", ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
