aamangeldi opened a new issue #8023: CeleryExecutor gevent/eventlet pools need 
monkey patching
URL: https://github.com/apache/airflow/issues/8023
 
 
   **Apache Airflow version**:
   1.10.9
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl 
version`): 
   ```
   Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.2", 
GitCommit:"c97fe5036ef3df2967d086711e6c0c405941e14b", GitTreeState:"clean", 
BuildDate:"2019-10-15T23:42:50Z", GoVersion:"go1.12.10", Compiler:"gc", 
Platform:"darwin/amd64"}
   Server Version: version.Info{Major:"1", Minor:"14+", 
GitVersion:"v1.14.10-gke.17", 
GitCommit:"bdceba0734835c6cb1acbd1c447caf17d8613b44", GitTreeState:"clean", 
BuildDate:"2020-01-17T23:10:13Z", GoVersion:"go1.12.12b4", Compiler:"gc", 
Platform:"linux/amd64"}
   ```
   Note: the issue is not specific to k8s.
   
   **Environment**:
   Any. I was able to reproduce using [CeleryExecutor 
docker-compose](https://github.com/puckel/docker-airflow/blob/1.10.9/docker-compose-CeleryExecutor.yml)
 in the puckel repo (code version tagged as 1.10.9).
   
   **What happened**:
   When setting the `pool` setting in the `[celery]` section [in 
airflow.cfg](https://github.com/apache/airflow/blob/master/airflow/config_templates/default_airflow.cfg#L598)
 to `eventlet` or `gevent`, task instances get scheduled, queued, picked up by 
the workers, **but not executed**.
   
   **What you expected to happen**:
   Task instances should be executed. The problem is that the application is 
not [monkey-patched](https://eventlet.net/doc/patching.html#monkey-patch). 
Celery by default handles monkey-patching but not in all scenarios (e.g. only 
if Celery is invoked via command line, [more 
info](https://github.com/celery/celery/search?q=maybe_patch_concurrency&unscoped_q=maybe_patch_concurrency)).
   
   Airflow invokes Celery workers in Python via 
[.run()](https://github.com/celery/celery/blob/v4.2.2/celery/bin/worker.py#L232).
 Unfortunately, this function does not handle monkey patching.
   
   **How to reproduce it**:
   1. Clone [puckel's 
docker-airflow](https://github.com/puckel/docker-airflow/tree/1.10.9):
       ```
       git clone [email protected]:puckel/docker-airflow.git
       ```
   2. Modify Dockerfile to:
       ```
       RUN pip install eventlet
       ```
       Then:
       ```
       docker build --rm -t puckel/docker-airflow:1.10.9 .
       ```
   3. Set `pool = eventlet` in 
[airflow.cfg](https://github.com/puckel/docker-airflow/blob/1.10.9/config/airflow.cfg#L509)
 (the file will be mounted by docker-compose).
   
   4. Spin up [the CeleryExecutor docker compose]([CeleryExecutor 
docker-compose](https://github.com/puckel/docker-airflow/blob/1.10.9/docker-compose-CeleryExecutor.yml):
       ```
       docker-compose -f docker-compose-CeleryExecutor.yml up -d
       ```
   5. Navigate to http://localhost:8080, and run an example DAG.
   6. Notice that no task ever gets to the running state.
   
   **Solution**:
   Ideally this should be fixed in 
[Celery]([email protected]:puckel/docker-airflow.git), but in the meantime it 
might be good to have a solution here as well. Here is a patch that I applied 
to solve this (on Airflow 1.10.9):
   ```
   --- cli.py   2020-03-27 17:05:45.000000000 -0400
   +++ cli-new.py       2020-03-27 17:19:48.000000000 -0400
   @@ -1098,7 +1098,10 @@
        }
   
        if conf.has_option("celery", "pool"):
   -        options["pool"] = conf.get("celery", "pool")
   +        pool = conf.get("celery", "pool")
   +        options["pool"] = pool
   +        from celery import maybe_patch_concurrency
   +        maybe_patch_concurrency(['-P', pool])
   
        if args.daemon:
            pid, stdout, stderr, log_file = setup_locations("worker",
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to