jgmarcel opened a new issue #19028:
URL: https://github.com/apache/airflow/issues/19028


   ### Apache Airflow version
   
   2.2.0 (latest released)
   
   ### Operating System
   
   Ubuntu 20.04.3 LTS
   
   ### Versions of Apache Airflow Providers
   
   ```
   apache-airflow-providers-celery   | 2.1.0
   apache-airflow-providers-ftp      | 2.0.1
   apache-airflow-providers-http     | 2.0.1
   apache-airflow-providers-imap     | 2.0.1
   apache-airflow-providers-postgres | 2.3.0
   apache-airflow-providers-redis    | 2.0.1
   apache-airflow-providers-sqlite   | 2.0.1
   ```
   
   ### Deployment
   
   Virtualenv installation
   
   ### Deployment details
   
   I closed followed the installations instructions given at [Installation from 
PyPI](https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html),
 with `AIRFLOW_VERSION=2.2.0` and `PYTHON_VERSION=3.9`. The installation was 
conducted using the initial user created by the Ubuntu installer, i.e. 
`uid=1000(ubuntu) gid=1000(ubuntu)`.
   
   Airflow is running locally on a single machine, which hosts the scheduler, 
webserver and worker. The database backend used is 
[PostgreSQL](https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html#setting-up-a-postgresql-database),
 and the chosen executor is the [Celery 
executor](https://airflow.apache.org/docs/apache-airflow/stable/executor/celery.html).
   
   Airflow is running with systemd, using the instructions given 
[here](https://airflow.apache.org/docs/apache-airflow/stable/howto/run-with-systemd.html)
 and the unit files found 
[here](https://github.com/apache/airflow/tree/main/scripts/systemd). The only 
changes made are regarding the `User=` and `Group=` settings, as follows:
   ```
   User=ubuntu
   Group=ubuntu
   ```
   
   As stated in the [Ubuntu Server 
Guide](https://ubuntu.com/server/docs/security-users):
   
   > By default, the initial user created by the Ubuntu installer is a member 
of the group `sudo` which is added to the file `/etc/sudoers` as an authorized 
`sudo` user.
   
   Additionally, the requirements for impersonation to work, given 
[here](https://ubuntu.com/server/docs/security-users), are strictly followed:
   
   ```
   $ sudo cat /etc/sudoers.d/90-cloud-init-users
   # Created by cloud-init v. 20.1-10-g71af48df-0ubuntu5 on Wed, 13 May 2020 
14:17:09 +0000
   
   # User rules for ubuntu
   ubuntu ALL=(ALL) NOPASSWD:ALL
   ```
   
   Since the requirements are met, the config `core:default_impersonation` is 
set as follows:
   ```
   # If set, tasks without a ``run_as_user`` argument will be run with this user
   # Can be used to de-elevate a sudo user running Airflow when executing tasks
   default_impersonation = produser
   ```
   
   Here, `produser` is an existing Unix user on the host machine:
   
   ```
   $ id produser
   uid=115(produser) gid=122(produser) groups=122(produser)
   ```
   
   ### What happened
   
   After finishing installation and opening the Airflow UI, I tested the 
installation by manually triggering the `tutorial_etl_dag` example DAG. The 
execution immediately failed. Here is what the worker logs show (the worker was 
immediately stopped after the error in order to keep the logs short):
   
   <details><summary>journalctl -u airflow-worker.service -b</summary>
   
   ```
   -- Logs begin at Wed 2020-05-13 14:16:58 UTC, end at Sun 2021-10-17 00:25:01 
UTC. --
   Oct 16 23:39:54 ip-172-19-250-178 systemd[1]: Started Airflow celery worker 
daemon.
   Oct 16 23:39:57 ip-172-19-250-178 airflow[4184]: [2021-10-16 23:39:57 +0000] 
[4184] [INFO] Starting gunicorn 20.1.0
   Oct 16 23:39:57 ip-172-19-250-178 airflow[4184]: [2021-10-16 23:39:57 +0000] 
[4184] [INFO] Listening at: http://0.0.0.0:8793 (4184)
   Oct 16 23:39:57 ip-172-19-250-178 airflow[4184]: [2021-10-16 23:39:57 +0000] 
[4184] [INFO] Using worker: sync
   Oct 16 23:39:57 ip-172-19-250-178 airflow[4189]: [2021-10-16 23:39:57 +0000] 
[4189] [INFO] Booting worker with pid: 4189
   Oct 16 23:39:57 ip-172-19-250-178 airflow[4199]: [2021-10-16 23:39:57 +0000] 
[4199] [INFO] Booting worker with pid: 4199
   Oct 16 23:39:58 ip-172-19-250-178 airflow[4020]:
   Oct 16 23:39:58 ip-172-19-250-178 airflow[4020]:  -------------- 
celery@ip-172-19-250-178 v5.1.2 (sun-harmonics)
   Oct 16 23:39:58 ip-172-19-250-178 airflow[4020]: --- ***** -----
   Oct 16 23:39:58 ip-172-19-250-178 airflow[4020]: -- ******* ---- 
Linux-5.11.0-1019-aws-x86_64-with-glibc2.31 2021-10-16 23:39:58
   Oct 16 23:39:58 ip-172-19-250-178 airflow[4020]: - *** --- * ---
   Oct 16 23:39:58 ip-172-19-250-178 airflow[4020]: - ** ---------- [config]
   Oct 16 23:39:58 ip-172-19-250-178 airflow[4020]: - ** ---------- .> app:     
    airflow.executors.celery_executor:0x7f4f6c0350a0
   Oct 16 23:39:58 ip-172-19-250-178 airflow[4020]: - ** ---------- .> 
transport:   redis://localhost:6379/0
   Oct 16 23:39:58 ip-172-19-250-178 airflow[4020]: - ** ---------- .> results: 
    postgresql://airflow:**@localhost:5432/airflow
   Oct 16 23:39:58 ip-172-19-250-178 airflow[4020]: - *** --- * --- .> 
concurrency: 16 (prefork)
   Oct 16 23:39:58 ip-172-19-250-178 airflow[4020]: -- ******* ---- .> task 
events: OFF (enable -E to monitor tasks in this worker)
   Oct 16 23:39:58 ip-172-19-250-178 airflow[4020]: --- ***** -----
   Oct 16 23:39:58 ip-172-19-250-178 airflow[4020]:  -------------- [queues]
   Oct 16 23:39:58 ip-172-19-250-178 airflow[4020]:                 .> default  
        exchange=default(direct) key=default
   Oct 16 23:39:58 ip-172-19-250-178 airflow[4020]:
   Oct 16 23:39:58 ip-172-19-250-178 airflow[4020]: [tasks]
   Oct 16 23:39:58 ip-172-19-250-178 airflow[4020]:   . 
airflow.executors.celery_executor.execute_command
   Oct 16 23:39:59 ip-172-19-250-178 airflow[4020]: [2021-10-16 23:39:59,748: 
INFO/MainProcess] Connected to redis://localhost:6379/0
   Oct 16 23:39:59 ip-172-19-250-178 airflow[4020]: [2021-10-16 23:39:59,756: 
INFO/MainProcess] mingle: searching for neighbors
   Oct 16 23:40:00 ip-172-19-250-178 airflow[4020]: [2021-10-16 23:40:00,774: 
INFO/MainProcess] mingle: all alone
   Oct 16 23:40:00 ip-172-19-250-178 airflow[4020]: [2021-10-16 23:40:00,790: 
INFO/MainProcess] celery@ip-172-19-250-178 ready.
   Oct 16 23:40:02 ip-172-19-250-178 airflow[4020]: [2021-10-16 23:40:02,817: 
INFO/MainProcess] Events of group {task} enabled by remote.
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4020]: [2021-10-16 23:42:10,310: 
INFO/MainProcess] Task 
airflow.executors.celery_executor.execute_command[400d064a-4849-4203-95ba-f5744bd3313b]
 received
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4284]: [2021-10-16 23:42:10,355: 
INFO/ForkPoolWorker-15] Executing command in Celery: ['airflow', 'tasks', 
'run', 'tutorial_etl_dag', 'extract', 
'manual__2021-10-16T23:42:09.773269+00:00', '--local', '--subdir', 
'/opt/conda/envs/airflow/lib/python3.9/site-packages/airflow/example_dags/tutorial_etl_dag.py']
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4284]: [2021-10-16 23:42:10,355: 
INFO/ForkPoolWorker-15] Celery task ID: 400d064a-4849-4203-95ba-f5744bd3313b
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]: [2021-10-16 23:42:10,413: 
INFO/ForkPoolWorker-15] Filling up the DagBag from 
/opt/conda/envs/airflow/lib/python3.9/site-packages/airflow/example_dags/tutorial_etl_dag.py
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]: [2021-10-16 23:42:10,479: 
WARNING/ForkPoolWorker-15] Could not import DAGs in 
example_kubernetes_executor_config.py: No module named 'kubernetes'
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]: [2021-10-16 23:42:10,479: 
WARNING/ForkPoolWorker-15] Install kubernetes dependencies with: pip install 
apache-airflow['cncf.kubernetes']
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]: [2021-10-16 23:42:10,552: 
WARNING/ForkPoolWorker-15] Running <TaskInstance: tutorial_etl_dag.extract 
manual__2021-10-16T23:42:09.773269+00:00 [queued]> on host ip-172-19-250-178
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]: [2021-10-16 23:42:10,552: 
WARNING/ForkPoolWorker-15]
   Oct 16 23:42:10 ip-172-19-250-178 sudo[4671]:   ubuntu : TTY=unknown ; PWD=/ 
; USER=root ; COMMAND=/usr/bin/chown produser /tmp/tmp51f_x5wa
   Oct 16 23:42:10 ip-172-19-250-178 sudo[4671]: pam_unix(sudo:session): 
session opened for user root by (uid=0)
   Oct 16 23:42:10 ip-172-19-250-178 sudo[4671]: pam_unix(sudo:session): 
session closed for user root
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]: [2021-10-16 23:42:10,626: 
ERROR/ForkPoolWorker-15] Failed to execute task [Errno 1] Operation not 
permitted: '/tmp/tmpsa9pohg6'.
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]: Traceback (most recent call 
last):
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]:   File 
"/opt/conda/envs/airflow/lib/python3.9/site-packages/airflow/executors/celery_executor.py",
 line 121, in _execute_in_fork
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]:     args.func(args)
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]:   File 
"/opt/conda/envs/airflow/lib/python3.9/site-packages/airflow/cli/cli_parser.py",
 line 48, in command
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]:     return func(*args, 
**kwargs)
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]:   File 
"/opt/conda/envs/airflow/lib/python3.9/site-packages/airflow/utils/cli.py", 
line 92, in wrapper
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]:     return f(*args, 
**kwargs)
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]:   File 
"/opt/conda/envs/airflow/lib/python3.9/site-packages/airflow/cli/commands/task_command.py",
 line 292, in task_run
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]:     
_run_task_by_selected_method(args, dag, ti)
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]:   File 
"/opt/conda/envs/airflow/lib/python3.9/site-packages/airflow/cli/commands/task_command.py",
 line 105, in _run_task_by_selected_method
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]:     
_run_task_by_local_task_job(args, ti)
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]:   File 
"/opt/conda/envs/airflow/lib/python3.9/site-packages/airflow/cli/commands/task_command.py",
 line 163, in _run_task_by_local_task_job
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]:     run_job.run()
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]:   File 
"/opt/conda/envs/airflow/lib/python3.9/site-packages/airflow/jobs/base_job.py", 
line 245, in run
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]:     self._execute()
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]:   File 
"/opt/conda/envs/airflow/lib/python3.9/site-packages/airflow/jobs/local_task_job.py",
 line 78, in _execute
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]:     self.task_runner = 
get_task_runner(self)
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]:   File 
"/opt/conda/envs/airflow/lib/python3.9/site-packages/airflow/task/task_runner/__init__.py",
 line 63, in get_task_runner
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]:     task_runner = 
task_runner_class(local_task_job)
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]:   File 
"/opt/conda/envs/airflow/lib/python3.9/site-packages/airflow/task/task_runner/standard_task_runner.py",
 line 35, in __init__
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]:     
super().__init__(local_task_job)
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]:   File 
"/opt/conda/envs/airflow/lib/python3.9/site-packages/airflow/task/task_runner/base_task_runner.py",
 line 91, in __init__
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]:     
os.chown(self._error_file.name, getpwnam(self.run_as_user).pw_uid, -1)
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]: PermissionError: [Errno 1] 
Operation not permitted: '/tmp/tmpsa9pohg6'
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4284]: [2021-10-16 23:42:10,645: 
ERROR/ForkPoolWorker-15] Task 
airflow.executors.celery_executor.execute_command[400d064a-4849-4203-95ba-f5744bd3313b]
 raised unexpected: AirflowException('Celery command failed on host: 
ip-172-19-250-178')
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4284]: Traceback (most recent call 
last):
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4284]:   File 
"/opt/conda/envs/airflow/lib/python3.9/site-packages/celery/app/trace.py", line 
450, in trace_task
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4284]:     R = retval = fun(*args, 
**kwargs)
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4284]:   File 
"/opt/conda/envs/airflow/lib/python3.9/site-packages/celery/app/trace.py", line 
731, in __protected_call__
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4284]:     return self.run(*args, 
**kwargs)
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4284]:   File 
"/opt/conda/envs/airflow/lib/python3.9/site-packages/airflow/executors/celery_executor.py",
 line 90, in execute_command
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4284]:     
_execute_in_fork(command_to_exec, celery_task_id)
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4284]:   File 
"/opt/conda/envs/airflow/lib/python3.9/site-packages/airflow/executors/celery_executor.py",
 line 101, in _execute_in_fork
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4284]:     raise 
AirflowException('Celery command failed on host: ' + get_hostname())
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4284]: 
airflow.exceptions.AirflowException: Celery command failed on host: 
ip-172-19-250-178
   Oct 16 23:42:39 ip-172-19-250-178 airflow[4189]: [2021-10-16 23:42:39 +0000] 
[4189] [INFO] Worker exiting (pid: 4189)
   Oct 16 23:42:39 ip-172-19-250-178 systemd[1]: Stopping Airflow celery worker 
daemon...
   Oct 16 23:42:39 ip-172-19-250-178 airflow[4020]: worker: Warm shutdown 
(MainProcess)
   Oct 16 23:42:39 ip-172-19-250-178 airflow[4199]: [2021-10-16 23:42:39 +0000] 
[4199] [INFO] Worker exiting (pid: 4199)
   Oct 16 23:42:39 ip-172-19-250-178 airflow[5006]: [2021-10-16 23:42:39 +0000] 
[5006] [INFO] Booting worker with pid: 5006
   Oct 16 23:42:39 ip-172-19-250-178 airflow[4184]: [2021-10-16 23:42:39 +0000] 
[4184] [INFO] Handling signal: term
   Oct 16 23:42:39 ip-172-19-250-178 airflow[5006]: [2021-10-16 23:42:39 +0000] 
[5006] [INFO] Worker exiting (pid: 5006)
   Oct 16 23:42:39 ip-172-19-250-178 airflow[4184]: [2021-10-16 23:42:39 +0000] 
[4184] [INFO] Shutting down: Master
   Oct 16 23:42:41 ip-172-19-250-178 systemd[1]: airflow-worker.service: 
Succeeded.
   Oct 16 23:42:41 ip-172-19-250-178 systemd[1]: Stopped Airflow celery worker 
daemon.
   ```
   </details>
   
   It seems that, although `sudo` is correctly used to set file ownership:
   
   ```
   Oct 16 23:42:10 ip-172-19-250-178 sudo[4671]:   ubuntu : TTY=unknown ; PWD=/ 
; USER=root ; COMMAND=/usr/bin/chown produser /tmp/tmp51f_x5wa
   Oct 16 23:42:10 ip-172-19-250-178 sudo[4671]: pam_unix(sudo:session): 
session opened for user root by (uid=0)
   Oct 16 23:42:10 ip-172-19-250-178 sudo[4671]: pam_unix(sudo:session): 
session closed for user root
   ```
   
   a `PermissionError` is being raised immediately thereafter:
   
   ```
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]: [2021-10-16 23:42:10,626: 
ERROR/ForkPoolWorker-15] Failed to execute task [Errno 1] Operation not 
permitted: '/tmp/tmpsa9pohg6'.
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]: Traceback (most recent call 
last):
   .
   .
   .
   Oct 16 23:42:10 ip-172-19-250-178 airflow[4664]: PermissionError: [Errno 1] 
Operation not permitted: '/tmp/tmpsa9pohg6'
   ```
   
   Note that the file names given above are different:
   
   ```
   $ /bin/ls -l /tmp/tmp51f_x5wa /tmp/tmpsa9pohg6
   -rw------- 1 produser ubuntu 8984 Oct 16 23:42 /tmp/tmp51f_x5wa
   -rw------- 1 ubuntu   ubuntu    0 Oct 16 23:42 /tmp/tmpsa9pohg6
   ```
   
   ### What you expected to happen
   
   The example DAG was expected to run successfully, when the config 
`core:default_impersonation` is set.
   
   ### How to reproduce
   
   In an installation following the deployment details given above, with the 
config `core:default_impersonation` set, the example DAG (or any other DAG, for 
that matter) fails every time. Commenting out the `core:default_impersonation` 
config from `airflow.cfg` **will not** trigger the error.
   
   ### Anything else
   
   I would like to know whether this behavior is indeed a bug or the result of 
an installation oversight on my part. If the latter, what should have I done 
differently?
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to