noah-gil opened a new issue, #26718:
URL: https://github.com/apache/airflow/issues/26718

   ### Apache Airflow Provider(s)
   
   docker
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-docker==3.1.0
   
   ### Apache Airflow version
   
   2.4.0
   
   ### Operating System
   
   Debian GNU/Linux 11 (bullseye)
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   Client: Docker Engine - Community
    Cloud integration: v1.0.28
    Version:           20.10.17
    API version:       1.41
    Go version:        go1.17.11
    Git commit:        100c701
    Built:             Mon Jun  6 23:03:17 2022
    OS/Arch:           linux/amd64
    Context:           default
    Experimental:      true
   
   Docker Compose: v2.7.0
   
   Using a slightly modified version of the example docker-compose.yaml:
   ```yaml
   # Licensed to the Apache Software Foundation (ASF) under one
   # or more contributor license agreements.  See the NOTICE file
   # distributed with this work for additional information
   # regarding copyright ownership.  The ASF licenses this file
   # to you under the Apache License, Version 2.0 (the
   # "License"); you may not use this file except in compliance
   # with the License.  You may obtain a copy of the License at
   #
   #   http://www.apache.org/licenses/LICENSE-2.0
   #
   # Unless required by applicable law or agreed to in writing,
   # software distributed under the License is distributed on an
   # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   # KIND, either express or implied.  See the License for the
   # specific language governing permissions and limitations
   # under the License.
   #
   
   # Basic Airflow cluster configuration for CeleryExecutor with Redis and 
PostgreSQL.
   #
   # WARNING: This configuration is for local development. Do not use it in a 
production deployment.
   #
   # This configuration supports basic configuration using environment 
variables or an .env file
   # The following variables are supported:
   #
   # AIRFLOW_IMAGE_NAME           - Docker image name used to run Airflow.
   #                                Default: apache/airflow:2.4.0
   # AIRFLOW_UID                  - User ID in Airflow containers
   #                                Default: 50000
   # Those configurations are useful mostly in case of standalone 
testing/running Airflow in test/try-out mode
   #
   # _AIRFLOW_WWW_USER_USERNAME   - Username for the administrator account (if 
requested).
   #                                Default: airflow
   # _AIRFLOW_WWW_USER_PASSWORD   - Password for the administrator account (if 
requested).
   #                                Default: airflow
   # _PIP_ADDITIONAL_REQUIREMENTS - Additional PIP requirements to add when 
starting all containers.
   #                                Default: ''
   #
   # Feel free to modify this file to suit your needs.
   ---
   version: '3'
   x-airflow-common:
     &airflow-common
     # In order to add custom dependencies or upgrade provider packages you can 
use your extended image.
     # Comment the image line, place your Dockerfile in the directory where you 
placed the docker-compose.yaml
     # and uncomment the "build" line below, Then run `docker-compose build` to 
build the images.
     image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.4.0}
     # build: .
     environment:
       &airflow-common-env
       AIRFLOW__CORE__EXECUTOR: LocalExecutor
       AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: 
postgresql+psycopg2://airflow:airflow@postgres/airflow
       # For backward compatibility, with Airflow <2.3
       AIRFLOW__CORE__SQL_ALCHEMY_CONN: 
postgresql+psycopg2://airflow:airflow@postgres/airflow
       AIRFLOW__CORE__FERNET_KEY: ''
       AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
       AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
       AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.basic_auth'
       _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
       IS_LOCAL: 'true'
     volumes:
       - ./dags:/opt/airflow/dags
       - ./logs:/opt/airflow/logs
       - ./plugins:/opt/airflow/plugins
       - ./kube.conf:/opt/airflow/kube.conf
       - /var/run/docker.sock:/var/run/docker.sock
     user: "${AIRFLOW_UID:-50000}:0"
     group_add:
       - '1001' # Add user to docker group. Change value depending on gid of 
docker on your machine
     depends_on:
       &airflow-common-depends-on
       redis:
         condition: service_healthy
       postgres:
         condition: service_healthy
   
   services:
     postgres:
       image: postgres:13
       environment:
         POSTGRES_USER: airflow
         POSTGRES_PASSWORD: airflow
         POSTGRES_DB: airflow
       volumes:
         - postgres-db-volume:/var/lib/postgresql/data
       healthcheck:
         test: ["CMD", "pg_isready", "-U", "airflow"]
         interval: 5s
         retries: 5
       restart: always
   
     redis:
       image: redis:latest
       expose:
         - 6379
       healthcheck:
         test: ["CMD", "redis-cli", "ping"]
         interval: 5s
         timeout: 30s
         retries: 50
       restart: always
   
     airflow-webserver:
       <<: *airflow-common
       command: webserver
       ports:
         - 8080:8080
       healthcheck:
         test: ["CMD", "curl", "--fail", "http://localhost:8080/health";]
         interval: 10s
         timeout: 10s
         retries: 5
       restart: always
       depends_on:
         <<: *airflow-common-depends-on
         airflow-init:
           condition: service_completed_successfully
   
     airflow-scheduler:
       <<: *airflow-common
       command: scheduler
       healthcheck:
         test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob 
--hostname "$${HOSTNAME}"']
         interval: 10s
         timeout: 10s
         retries: 5
       restart: always
       depends_on:
         <<: *airflow-common-depends-on
         airflow-init:
           condition: service_completed_successfully
   
     airflow-init:
       <<: *airflow-common
       entrypoint: /bin/bash
       # yamllint disable rule:line-length
       command:
         - -c
         - |
           function ver() {
             printf "%04d%04d%04d%04d" $${1//./ }
           }
           airflow_version=$$(AIRFLOW__LOGGING__LOGGING_LEVEL=INFO && gosu 
airflow airflow version)
           airflow_version_comparable=$$(ver $${airflow_version})
           min_airflow_version=2.2.0
           min_airflow_version_comparable=$$(ver $${min_airflow_version})
           if (( airflow_version_comparable < min_airflow_version_comparable 
)); then
             echo
             echo -e "\033[1;31mERROR!!!: Too old Airflow version 
$${airflow_version}!\e[0m"
             echo "The minimum Airflow version supported: 
$${min_airflow_version}. Only use this or higher!"
             echo
             exit 1
           fi
           if [[ -z "${AIRFLOW_UID}" ]]; then
             echo
             echo -e "\033[1;33mWARNING!!!: AIRFLOW_UID not set!\e[0m"
             echo "If you are on Linux, you SHOULD follow the instructions 
below to set "
             echo "AIRFLOW_UID environment variable, otherwise files will be 
owned by root."
             echo "For other operating systems you can get rid of the warning 
with manually created .env file:"
             echo "    See: 
https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#setting-the-right-airflow-user";
             echo
           fi
           one_meg=1048576
           mem_available=$$(($$(getconf _PHYS_PAGES) * $$(getconf PAGE_SIZE) / 
one_meg))
           cpus_available=$$(grep -cE 'cpu[0-9]+' /proc/stat)
           disk_available=$$(df / | tail -1 | awk '{print $$4}')
           warning_resources="false"
           if (( mem_available < 4000 )) ; then
             echo
             echo -e "\033[1;33mWARNING!!!: Not enough memory available for 
Docker.\e[0m"
             echo "At least 4GB of memory required. You have $$(numfmt --to iec 
$$((mem_available * one_meg)))"
             echo
             warning_resources="true"
           fi
           if (( cpus_available < 2 )); then
             echo
             echo -e "\033[1;33mWARNING!!!: Not enough CPUS available for 
Docker.\e[0m"
             echo "At least 2 CPUs recommended. You have $${cpus_available}"
             echo
             warning_resources="true"
           fi
           if (( disk_available < one_meg * 10 )); then
             echo
             echo -e "\033[1;33mWARNING!!!: Not enough Disk space available for 
Docker.\e[0m"
             echo "At least 10 GBs recommended. You have $$(numfmt --to iec 
$$((disk_available * 1024 )))"
             echo
             warning_resources="true"
           fi
           if [[ $${warning_resources} == "true" ]]; then
             echo
             echo -e "\033[1;33mWARNING!!!: You have not enough resources to 
run Airflow (see above)!\e[0m"
             echo "Please follow the instructions to increase amount of 
resources available:"
             echo "   
https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#before-you-begin";
             echo
           fi
           mkdir -p /sources/logs /sources/dags /sources/plugins
           chown -R "${AIRFLOW_UID}:0" /sources/{logs,dags,plugins}
           exec /entrypoint airflow version
       # yamllint enable rule:line-length
       environment:
         <<: *airflow-common-env
         _AIRFLOW_DB_UPGRADE: 'true'
         _AIRFLOW_WWW_USER_CREATE: 'true'
         _AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
         _AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
         _PIP_ADDITIONAL_REQUIREMENTS: ''
       user: "0:0"
       volumes:
         - .:/sources
   
     airflow-cli:
       <<: *airflow-common
       profiles:
         - debug
       environment:
         <<: *airflow-common-env
         CONNECTION_CHECK_MAX_COUNT: "0"
       # Workaround for entrypoint issue. See: 
https://github.com/apache/airflow/issues/16252
       command:
         - bash
         - -c
         - airflow
   
   volumes:
     postgres-db-volume:
   
   ```
   
   ### What happened
   
   I was trying to test running a task using the `@task.docker` decorator, so I 
set up the following DAG with a series of Docker tasks.
   
   ```python
   from airflow import DAG
   from airflow.decorators import task, dag
   
   from docker.types import Mount
   from datetime import datetime
   
   @dag(
       description='Run a series of Docker containers with outputs',
       start_date=datetime(2022, 1, 1),
       catchup=False,
       schedule_interval=None,
   )
   def docker_parallel_decorator():
       @task.docker(image="python:3.9-slim-bullseye", params={"expect_airflow": 
False})
       def container_a():
           print("Hello from Container A")
           return None
   
       @task.docker(image="python:3.9-slim-bullseye", params={"expect_airflow": 
False})
       def container_b():
           print("Hello from Container B")
           return None
   
       @task.docker(image="python:3.9-slim-bullseye", params={"expect_airflow": 
False})
       def container_c():
           print("Hello from Container C")
           return None
   
       container_a() >> container_b() >> container_c()
   
   docker_parallel_decorator()
   ```
   
   In the past, I've had success with the DockerOperator, so I expected no 
difference. However, I received the following error in the output log:
   ```
   [2022-09-27, 17:37:03 UTC] {taskinstance.py:1851} ERROR - Task failed with 
exception
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/docker/decorators/docker.py",
 line 111, in execute
       filename=script_filename,
     File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/python_virtualenv.py",
 line 128, in write_python_script
       template.stream(**jinja_context).dump(filename)
     File 
"/home/airflow/.local/lib/python3.7/site-packages/jinja2/environment.py", line 
1618, in dump
       fp.writelines(iterable)
     File 
"/home/airflow/.local/lib/python3.7/site-packages/jinja2/environment.py", line 
1613, in <genexpr>
       iterable = (x.encode(encoding, errors) for x in self)  # type: ignore
     File 
"/home/airflow/.local/lib/python3.7/site-packages/jinja2/environment.py", line 
1662, in __next__
       return self._next()  # type: ignore
     File 
"/home/airflow/.local/lib/python3.7/site-packages/jinja2/environment.py", line 
1354, in generate
       yield self.environment.handle_exception()
     File 
"/home/airflow/.local/lib/python3.7/site-packages/jinja2/environment.py", line 
936, in handle_exception
       raise rewrite_traceback_stack(source=source)
     File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/python_virtualenv_script.jinja2",
 line 23, in top-level template code
       {% if expect_airflow %}
   jinja2.exceptions.UndefinedError: 'expect_***' is undefined
   ```
   
   ### What you think should happen instead
   
   I expected the Docker tasks to run the code in the provided Python function.
   
   ### How to reproduce
   
   1. Deploy Airflow from the provided docker-compose.yaml file
   2. Place the provided DAG into the `./dags` folder
   3. Manually trigger the `docker_parallel_decorator` from the web UI
   
   ### Anything else
   
   I have no experience with Jinja, so I don't know the specifics, but I 
noticed that I was able to create a workaround by patching the 
`/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/docker/decorators/docker.py`
 file in the `airflow-scheduler` service.
   
   First, I copied the file out of the container.
   ```bash
   docker compose cp 
airflow-scheduler:/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/docker/decorators/docker.py
 ./docker.py
   ```
   
   Then I changed the following snippet starting on line 101:
   
   ```python
               write_python_script(
                   jinja_context=dict(
                       op_args=self.op_args,
                       op_kwargs=self.op_kwargs,
                       pickling_library=self.pickling_library.__name__,
                       python_callable=self.python_callable.__name__,
                       python_callable_source=py_source,
                       string_args_global=False,
                   ),
                   filename=script_filename,
               )
   ```
   
   To this:
   
   ```python
               write_python_script(
                   jinja_context=dict(
                       op_args=self.op_args,
                       op_kwargs=self.op_kwargs,
                       pickling_library=self.pickling_library.__name__,
                       python_callable=self.python_callable.__name__,
                       python_callable_source=py_source,
                       string_args_global=False,
                       expect_airflow=False, # Added this line
                   ),
                   filename=script_filename,
               )
   ```
   
   Then I copied the file back into the container.
   
   ```bash
   docker compose cp ./docker.py 
airflow-scheduler:/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/docker/decorators/docker.py
   ```
   
   After that, running the DAG resulted in no errors with the expected output 
in the logs.
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to