dolfinus opened a new issue, #28402:
URL: https://github.com/apache/airflow/issues/28402

   ### Description
   
   Currently I'm working on managed Kubernetes cluster with vSphere CSI 
manager, which supports PVCs with accessMode=ReadWriteOnce only. So I cannot 
mount the same volume with dags into all the pods (webserver, scheduler, etc).
   
   Also I cannot use gitSync because there is no access k8s -> git server 
(because of security reasons). But there is an access git -> CI runner -> k8s 
cluster.
   
   I whish I have some way to push dags from git to Airflow without 
implementing some overcomplicated way of deploying dags to the cluster. For 
example, just push them to the volume with dagProcessor using my CI runner, 
then dagProcessor will parse all the dags, save them into the database, and 
then Airflow could execute them.
   
   All the components (standalone dagProcessor, saving dags to the database, 
reading dags from the database) are already there, after dags are parsed I can 
see source code in the web server. But when I try to run this dag, I get an 
exception:
   ```
   
   [2022-12-16, 08:51:21 UTC] {taskinstance.py:1165} INFO - Dependencies all 
met for <TaskInstance: tutorial.print_date 
manual__2022-12-16T08:51:20.145317+00:00 [queued]>
   [2022-12-16, 08:51:21 UTC] {taskinstance.py:1165} INFO - Dependencies all 
met for <TaskInstance: tutorial.print_date 
manual__2022-12-16T08:51:20.145317+00:00 [queued]>
   [2022-12-16, 08:51:21 UTC] {taskinstance.py:1362} INFO - 
   
--------------------------------------------------------------------------------
   [2022-12-16, 08:51:21 UTC] {taskinstance.py:1363} INFO - Starting attempt 1 
of 2
   [2022-12-16, 08:51:21 UTC] {taskinstance.py:1364} INFO - 
   
--------------------------------------------------------------------------------
   [2022-12-16, 08:51:21 UTC] {taskinstance.py:1383} INFO - Executing 
<Task(BashOperator): print_date> on 2022-12-16 08:51:20.145317+00:00
   [2022-12-16, 08:51:21 UTC] {standard_task_runner.py:54} INFO - Started 
process 11404 to run task
   [2022-12-16, 08:51:21 UTC] {standard_task_runner.py:82} INFO - Running: 
['airflow', 'tasks', 'run', 'tutorial', 'print_date', 
'manual__2022-12-16T08:51:20.145317+00:00', '--job-id', '1015', '--raw', 
'--subdir', 'DAGS_FOLDER/tutorial.py', '--cfg-path', '/tmp/tmp40xmsmk9']
   [2022-12-16, 08:51:21 UTC] {standard_task_runner.py:83} INFO - Job 1015: 
Subtask print_date
   [2022-12-16, 08:51:21 UTC] {dagbag.py:525} INFO - Filling up the DagBag from 
/opt/airflow/dags/tutorial.py
   [2022-12-16, 08:51:21 UTC] {standard_task_runner.py:107} ERROR - Failed to 
execute job 1015 for task print_date (Dag 'tutorial' could not be found; either 
it does not exist or it failed to parse.; 11404)
   [2022-12-16, 08:51:21 UTC] {local_task_job.py:164} INFO - Task exited with 
return code 1
   [2022-12-16, 08:51:21 UTC] {local_task_job.py:273} INFO - 0 downstream tasks 
scheduled from follow-on schedule check
   ```
   
   This is caused by `airflow.cli.commands.task_command.task_run` loading only 
the dag source code from the file system of worker/scheduler:
   
https://github.com/apache/airflow/blob/3bee4818e5d8f3ad8c1792453efb7d0c93a0236f/airflow/cli/commands/task_command.py#L378
   
https://github.com/apache/airflow/blob/3bee4818e5d8f3ad8c1792453efb7d0c93a0236f/airflow/utils/cli.py#L225-L226
   
https://github.com/apache/airflow/blob/3bee4818e5d8f3ad8c1792453efb7d0c93a0236f/airflow/models/dagbag.py#L98
   
   My proposal - add an argument `--read-dags-from-db` to `airflow task` cli 
commands (at least for one that require only reading access for dags), and some 
config option to `[scheduler]` to pass this argument to task runner.
   This allows to fetch dag source code from the database instead of file 
system, and the only pod which should have an access to the dags PVC is the 
dagProcessor.
   
   This could also eliminate adding gitSync sidecar to all the pods with 
Airflow components. But not in all the cases - for example, if someone places 
some python module in the dags folder and imports it in the dag, this will not 
work because module content is not being saved in the database, and dag import 
will fail if module is not present in the worker file system.
   
   ### Use case/motivation
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to