dolfinus opened a new issue, #28402:
URL: https://github.com/apache/airflow/issues/28402
### Description
Currently I'm working on managed Kubernetes cluster with vSphere CSI
manager, which supports PVCs with accessMode=ReadWriteOnce only. So I cannot
mount the same volume with dags into all the pods (webserver, scheduler, etc).
Also I cannot use gitSync because there is no access k8s -> git server
(because of security reasons). But there is an access git -> CI runner -> k8s
cluster.
I whish I have some way to push dags from git to Airflow without
implementing some overcomplicated way of deploying dags to the cluster. For
example, just push them to the volume with dagProcessor using my CI runner,
then dagProcessor will parse all the dags, save them into the database, and
then Airflow could execute them.
All the components (standalone dagProcessor, saving dags to the database,
reading dags from the database) are already there, after dags are parsed I can
see source code in the web server. But when I try to run this dag, I get an
exception:
```
[2022-12-16, 08:51:21 UTC] {taskinstance.py:1165} INFO - Dependencies all
met for <TaskInstance: tutorial.print_date
manual__2022-12-16T08:51:20.145317+00:00 [queued]>
[2022-12-16, 08:51:21 UTC] {taskinstance.py:1165} INFO - Dependencies all
met for <TaskInstance: tutorial.print_date
manual__2022-12-16T08:51:20.145317+00:00 [queued]>
[2022-12-16, 08:51:21 UTC] {taskinstance.py:1362} INFO -
--------------------------------------------------------------------------------
[2022-12-16, 08:51:21 UTC] {taskinstance.py:1363} INFO - Starting attempt 1
of 2
[2022-12-16, 08:51:21 UTC] {taskinstance.py:1364} INFO -
--------------------------------------------------------------------------------
[2022-12-16, 08:51:21 UTC] {taskinstance.py:1383} INFO - Executing
<Task(BashOperator): print_date> on 2022-12-16 08:51:20.145317+00:00
[2022-12-16, 08:51:21 UTC] {standard_task_runner.py:54} INFO - Started
process 11404 to run task
[2022-12-16, 08:51:21 UTC] {standard_task_runner.py:82} INFO - Running:
['airflow', 'tasks', 'run', 'tutorial', 'print_date',
'manual__2022-12-16T08:51:20.145317+00:00', '--job-id', '1015', '--raw',
'--subdir', 'DAGS_FOLDER/tutorial.py', '--cfg-path', '/tmp/tmp40xmsmk9']
[2022-12-16, 08:51:21 UTC] {standard_task_runner.py:83} INFO - Job 1015:
Subtask print_date
[2022-12-16, 08:51:21 UTC] {dagbag.py:525} INFO - Filling up the DagBag from
/opt/airflow/dags/tutorial.py
[2022-12-16, 08:51:21 UTC] {standard_task_runner.py:107} ERROR - Failed to
execute job 1015 for task print_date (Dag 'tutorial' could not be found; either
it does not exist or it failed to parse.; 11404)
[2022-12-16, 08:51:21 UTC] {local_task_job.py:164} INFO - Task exited with
return code 1
[2022-12-16, 08:51:21 UTC] {local_task_job.py:273} INFO - 0 downstream tasks
scheduled from follow-on schedule check
```
This is caused by `airflow.cli.commands.task_command.task_run` loading only
the dag source code from the file system of worker/scheduler:
https://github.com/apache/airflow/blob/3bee4818e5d8f3ad8c1792453efb7d0c93a0236f/airflow/cli/commands/task_command.py#L378
https://github.com/apache/airflow/blob/3bee4818e5d8f3ad8c1792453efb7d0c93a0236f/airflow/utils/cli.py#L225-L226
https://github.com/apache/airflow/blob/3bee4818e5d8f3ad8c1792453efb7d0c93a0236f/airflow/models/dagbag.py#L98
My proposal - add an argument `--read-dags-from-db` to `airflow task` cli
commands (at least for one that require only reading access for dags), and some
config option to `[scheduler]` to pass this argument to task runner.
This allows to fetch dag source code from the database instead of file
system, and the only pod which should have an access to the dags PVC is the
dagProcessor.
This could also eliminate adding gitSync sidecar to all the pods with
Airflow components. But not in all the cases - for example, if someone places
some python module in the dags folder and imports it in the dag, this will not
work because module content is not being saved in the database, and dag import
will fail if module is not present in the worker file system.
### Use case/motivation
_No response_
### Related issues
_No response_
### Are you willing to submit a PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]