sanje2v opened a new issue #21832:
URL: https://github.com/apache/airflow/issues/21832


   ### Apache Airflow version
   
   2.2.4 (latest released)
   
   ### What happened
   
   I am using Airflow 2.2.4 docker which is to run a DAG, `test_dag.py`, 
defined as follows:
   
   ```
   from airflow.decorators import dag, task
   from airflow.utils import dates
   
   
   @dag(schedule_interval=None,
        start_date=dates.days_ago(1),
        catchup=False)
   def test_dag():
   
       @task.docker(image='company/my-repo',
                    api_version='auto',
                    docker_url='tcp://docker-socket-proxy:2375/',
                    auto_remove=True)
       def docker_task(inp):
           print(inp)
           return inp+1
   
       @task.python()
       def python_task(inp):
           print(inp)
   
       out = docker_task(10)
       python_task(out)
   
   
   _ = test_dag()
   ```
   
   The Dockerfile for 'company/my-repo' is as follows:
   
   ```
   FROM nvidia/cuda:11.2.2-runtime-ubuntu20.04
   
   USER root
   ARG DEBIAN_FRONTEND=noninteractive
   
   RUN apt-get update && apt-get install -y python3 python3-pip
   ```
   
   ### What you expected to happen
   
   I expected the DAG logs for `docker_task()` and `python_task()` to have 10 
and 11 as output respectively.
   
   Instead, the internal Airflow unmarshaller that is supposed to unpickle the 
function definition of `docker_task()` inside the container of image 
`company/my-repo` via `__PYTHON_SCRIPT` environmental variable to run it, makes 
an **incorrect assumption** that the symbol `python` is defined as an alias for 
either `/usr/bin/python2` or `/usr/bin/python3`. Most linux python 
installations require that users explicitly specify either `python2` or 
`python3` when running their scripts and `python` is NOT defined even when 
`python3` is installed via aptitude package manager.
   
   This error can be resolved for now by adding the following to `Dockerfile` 
after python3 package installation:
   `RUN apt-get install -y python-is-python3`
   
   But this should NOT be a requirement.
   
   `Dockerfile`s using base python images do not suffer from this problem as 
they have the alias `python` defined.
   
   
   The error logged is:
   ```
   [2022-02-26, 11:30:47 UTC] {docker.py:258} INFO - Starting docker container 
from image company/my-repo
   [2022-02-26, 11:30:48 UTC] {docker.py:320} INFO - + python -c 'import 
base64, os;x = base64.b64decode(os.environ["__PYTHON_SCRIPT"]);f = 
open("/tmp/script.py", "wb"); f.write(x);'
   [2022-02-26, 11:30:48 UTC] {docker.py:320} INFO - bash: python: command not 
found
   [2022-02-26, 11:30:48 UTC] {taskinstance.py:1700} ERROR - Task failed with 
exception
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/models/taskinstance.py",
 line 1329, in _run_raw_task
       self._execute_task_with_callbacks(context)
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/models/taskinstance.py",
 line 1455, in _execute_task_with_callbacks
       result = self._execute_task(context, self.task)
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/models/taskinstance.py",
 line 1511, in _execute_task
       result = execute_callable(context=context)
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/docker/decorators/docker.py",
 line 117, in execute
       return super().execute(context)
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/decorators/base.py", 
line 134, in execute
       return_value = super().execute(context)
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/docker/operators/docker.py",
 line 390, in execute
       return self._run_image()
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/docker/operators/docker.py",
 line 265, in _run_image
       return self._run_image_with_mounts(self.mounts + [tmp_mount], 
add_tmp_variable=True)
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/docker/operators/docker.py",
 line 324, in _run_image_with_mounts
       raise AirflowException('docker container failed: ' + repr(result) + 
f"lines {res_lines}")
   airflow.exceptions.AirflowException: docker container failed: {'Error': 
None, 'StatusCode': 127}lines + python -c 'import base64, os;x = 
base64.b64decode(os.environ["__PYTHON_SCRIPT"]);f = open("/tmp/script.py", 
"wb"); f.write(x);'
   bash: python: command not found
   ```
   
   ### How to reproduce
   
   _No response_
   
   ### Operating System
   
   Ubuntu 20.04 WSL 2
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to