sstoefe opened a new issue #13675:
URL: https://github.com/apache/airflow/issues/13675


   <!--
   
   Welcome to Apache Airflow!  For a smooth issue process, try to answer the 
following questions.
   Don't worry if they're not all applicable; just try to include what you can 
:-)
   
   If you need to include code snippets or logs, please put them in fenced code
   blocks.  If they're super-long, please use the details tag like
   <details><summary>super-long log</summary> lots of stuff </details>
   
   Please delete these comment blocks before submitting the issue.
   
   -->
   
   <!--
   
   IMPORTANT!!!
   
   PLEASE CHECK "SIMILAR TO X EXISTING ISSUES" OPTION IF VISIBLE
   NEXT TO "SUBMIT NEW ISSUE" BUTTON!!!
   
   PLEASE CHECK IF THIS ISSUE HAS BEEN REPORTED PREVIOUSLY USING SEARCH!!!
   
   Please complete the next sections or the issue will be closed.
   These questions are the first thing we need to know to understand the 
context.
   
   -->
   
   **Apache Airflow version**: v2.0.0
   **Git Version**: release:2.0.0+ab5f770bfcd8c690cbe4d0825896325aca0beeca
   
   
   **Docker version**: Docker version 20.10.1, build 831ebeae96
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**: local setup, docker engine 
in swarm mode, docker stack deploy
   - **OS** (e.g. from /etc/os-release): Manjaro Linux
   - **Kernel** (e.g. `uname -a`): 5.9.11
   - **Install tools**: 
     - docker airflow image apache/airflow:2.0.0-python3.8 (hash _fe4a64af9553_)
   - **Others**:
   
   **What happened**:
   
   When using `DockerSwarmOperator` (either `contrib` or `providers` module) 
together with the default `enable_logging=True` option, tasks do not succeed 
and stay in state `running`. When checking the `docker service logs` I can 
clearly see that the container ran and ended successfully. Airflow however does 
not recognize that the container finished and keeps the tasks in state 
`running`.
   
   However, when using `enable_logging=False` AND `auto_remove=False` 
containers are recognized as finished and tasks are correctly in state 
`success`. When using `enable_logging=False` and `auto_remove=True` I get the 
following error message
   ```
   {taskinstance.py:1396} ERROR - 404 Client Error: Not Found ("service 
936om1s4zso10ye5ferhvwnxn not found")
   ```
   
   <!-- (please include exact error messages if you can) -->
   
   **What you expected to happen**:
   
   When I run a DAG with `DockerSwarmOperator`s in it I expect that docker 
containers are distributed to the docker swarm and that container logs and 
states are correctly tracked by the DockerSwarmOperator. Meaning, with 
`enable_logging=True` option I would expect that the TaskInstance's log 
contains the logging output of the docker container/service. Furthermore, when 
using the `auto_remove=True` option I would expect that docker services are 
removed after the TaskInstance is finished successfully.
   
   <!-- What do you think went wrong? -->
   It looks like something is broken with the `enable_logging` and 
`auto_remove=True` options.
   
   **How to reproduce it**:
   #### **`Dockerfile`**
   ```
   FROM apache/airflow:2.0.0-python3.8
   
   ARG DOCKER_GROUP_ID
   
   USER root
   
   RUN groupadd --gid $DOCKER_GROUP_ID docker \
       && usermod -aG docker airflow
   
   USER airflow
   ```
   
   airflow user needs to be in the docker group to have access to the docker 
daemon
   
   #### **build the Dockerfile**
   ```
   docker build --build-arg DOCKER_GROUP_ID=$(getent group docker | awk -F: 
'{print $3}') -t docker-swarm-bug .
   ```
   
   #### **`docker-stack.yml`**
   ```
   version: "3.2"
   networks:
     airflow:
   
   services:
     postgres:
       image: postgres:13.1
       environment:
         - POSTGRES_USER=airflow
         - POSTGRES_DB=airflow
         - POSTGRES_PASSWORD=airflow
         - PGDATA=/var/lib/postgresql/data/pgdata
       ports:
         - 5432:5432
       volumes:
         - /var/run/docker.sock:/var/run/docker.sock
         - ./database/data:/var/lib/postgresql/data/pgdata
         - ./database/logs:/var/lib/postgresql/data/log
       command: >
         postgres
           -c listen_addresses=*
           -c logging_collector=on
           -c log_destination=stderr
           -c max_connections=200
       networks:
         - airflow
     redis:
       image: redis:5.0.5
       environment:
         REDIS_HOST: redis
         REDIS_PORT: 6379
       ports:
         - 6379:6379
       networks:
         - airflow
     webserver:
       env_file:
         - .env
       image: docker-swarm-bug:latest
       ports:
         - 8080:8080
       volumes:
         - ./airflow_files/dags:/opt/airflow/dags
         - ./logs:/opt/airflow/logs
         - ./files:/opt/airflow/files
         - /var/run/docker.sock:/var/run/docker.sock
       deploy:
         restart_policy:
           condition: on-failure
           delay: 8s
           max_attempts: 3
       depends_on:
         - postgres
         - redis
       command: webserver
       healthcheck:
         test: ["CMD-SHELL", "[ -f /opt/airflow/airflow-webserver.pid ]"]
         interval: 30s
         timeout: 30s
         retries: 3
       networks:
         - airflow
     flower:
       image: docker-swarm-bug:latest
       env_file:
         - .env
       ports:
         - 5555:5555
       depends_on:
         - redis
       deploy:
         restart_policy:
           condition: on-failure
           delay: 8s
           max_attempts: 3
       volumes:
         - ./logs:/opt/airflow/logs
       command: celery flower
       networks:
         - airflow
     scheduler:
       image: docker-swarm-bug:latest
       env_file:
         - .env
       volumes:
         - ./airflow_files/dags:/opt/airflow/dags
         - ./logs:/opt/airflow/logs
         - ./files:/opt/airflow/files
         - /var/run/docker.sock:/var/run/docker.sock
       command: scheduler
       deploy:
         restart_policy:
           condition: on-failure
           delay: 8s
           max_attempts: 3
       networks:
         - airflow
     worker:
       image: docker-swarm-bug:latest
       env_file:
         - .env
       volumes:
         - ./airflow_files/dags:/opt/airflow/dags
         - ./logs:/opt/airflow/logs
         - ./files:/opt/airflow/files
         - /var/run/docker.sock:/var/run/docker.sock
       command: celery worker
       depends_on:
         - scheduler
   
       deploy:
         restart_policy:
           condition: on-failure
           delay: 8s
           max_attempts: 3
       networks:
         - airflow
     initdb:
       image: docker-swarm-bug:latest
       env_file:
         - .env
       volumes:
         - ./airflow_files/dags:/opt/airflow/dags
         - ./logs:/opt/airflow/logs
         - ./files:/opt/airflow/files
         - /var/run/docker.sock:/var/run/docker.sock
       entrypoint: /bin/bash
       deploy:
         restart_policy:
           condition: on-failure
           delay: 8s
           max_attempts: 5
       command: -c "airflow db init && airflow users create --firstname admin 
--lastname admin --email admin --password admin --username admin --role Admin"
       depends_on:
         - redis
         - postgres
       networks:
         - airflow
   ```
   
   #### **`docker_swarm_bug.py`**
   ```
   from airflow import DAG
   from airflow.operators.bash_operator import BashOperator
   from airflow.providers.docker.operators.docker_swarm import 
DockerSwarmOperator
   # you can also try DockerSwarmOperator from contrib module, shouldn't make a 
difference
   # from airflow.contrib.operators.docker_swarm_operator import 
DockerSwarmOperator
   
   default_args = {
       "owner": "airflow",
       "start_date": "2021-01-14"
   }
   
   with DAG(
       "docker_swarm_bug", default_args=default_args, schedule_interval="@once"
   ) as dag:
       start_op = BashOperator(
           task_id="start_op", bash_command="echo start testing multiple 
dockers",
       )
   
       docker_swarm = list()
       for i in range(16):
           docker_swarm.append(
               DockerSwarmOperator(
                   task_id=f"docker_swarm_{i}",
                   image="hello-world:latest",
                   force_pull=True,
                   auto_remove=True,
                   api_version="auto",
                   docker_url="unix://var/run/docker.sock",
                   network_mode="bridge",
                   enable_logging=False,
               )
           )
   
       finish_op = BashOperator(
           task_id="finish_op", bash_command="echo finish testing multiple 
dockers",
       )
   
       start_op >> docker_swarm >> finish_op
   ```
   
   #### **create directories, copy DAG and set permissions**
   ```
   mkdir -p airflow_files/dags
   cp docker_swarm_bug.py airflow_files/dags/
   mkdir logs
   mkdir files
   sudo chown -R 50000 airflow_files logs files
   ```
   uid 50000 is the id of the airflow user inside the docker images
   
   #### **deploy `docker-stack.yml`**
   ```
   docker stack deploy --compose-file docker-stack.yml airflow
   ```
   
   #### **trigger DAG `docker_swarm_bug` in UI**
   
   **Anything else we need to know**:
   
   Problem occurs with the options `enable_logging=True`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to