[ 
https://issues.apache.org/jira/browse/AIRFLOW-6778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Willard updated AIRFLOW-6778:
-------------------------------------
    Description: 
The worker pods generated by the Kubernetes Executor force the DAGs PVC to be 
mounted at the Airflow DAGs folder.  This, combined with a general inability to 
specify arbitrary PVCs on workers (see AIRFLOW-3126 and the linked/duplicated 
issues), severely constrains the usability of worker pods and the Kubernetes 
Executor as a whole.

 

For example, if a DAGs-containing PVC is rooted at a Python package (e.g. 
{{package/}}) that needs to be installed on each worker (e.g. DAGs in 
{{package/dags/}}, package install point at {{package/setup.py}}, and Airflow 
DAGs location {{/airflow/dags}}), then the current static mount point logic 
will only allow a worker to directly mount the entire package into the Airflow 
DAGs location  —  while the actual DAGs are in a subdirectory — or exclusively 
mount the package's sub-path {{package/dags}} (using the existing 
{{kubernetes.dags_volume_subpath}} config option).  While the latter is at 
least correct, it completely foregoes the required parent directory and it 
makes the requisite package unavailable for installation (e.g. the files under 
{{package/}} are not available).

 

-In general, the only approach that seems to work for the Kubernetes Executor 
is to specify a worker image with all DAG dependencies pre-loaded, which 
largely voids the usefulness of a single DAGs PVC that can be dynamically 
updated.  At best, one can include a {{requirements.txt}} in the PVC and use it 
in tandem with an entry-point script built into the image, but that still 
doesn't help with source installations of custom packages stored and updated in 
a PVC.-

Edit: This isn't even possible, because worker pods are created using [the 
{{command}} field instead of 
{{args}}|https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/#notes]!

 

A quick fix for this situation is to allow one to specify the DAGs PVC mount 
point.  With this option, one can mount the PVC anywhere and specify an Airflow 
DAGs location that works in conjunction with the mount point (e.g. mount the 
PVC at {{/airflow/package}} and independently set the Airflow DAGs location to 
{{/airflow/package/dags}}).  This option would — in many cases — obviate the 
need for the marginally useful {{kubernetes.dags_volume_subpath}} options, as 
well.

  was:
The worker pods generated by the Kubernetes Executor force the DAGs PVC to be 
mounted at the Airflow DAGs folder.  This, combined with a general inability to 
specify arbitrary PVCs on workers (see AIRFLOW-3126 and the linked/duplicated 
issues), severely constrains the usability of worker pods and the Kubernetes 
Executor as a whole.

 

For example, if a DAGs-containing PVC is rooted at a Python package (e.g. 
{{package/}}) that needs to be installed on each worker (e.g. DAGs in 
{{package/dags/}}, package install point at {{package/setup.py}}, and Airflow 
DAGs location {{/airflow/dags}}), then the current static mount point logic 
will only allow a worker to directly mount the entire package into the Airflow 
DAGs location  —  while the actual DAGs are in a subdirectory — or exclusively 
mount the package's sub-path {{package/dags}} (using the existing 
{{kubernetes.dags_volume_subpath}} config option).  While the latter is at 
least correct, it completely foregoes the required parent directory and it 
makes the requisite package unavailable for installation (e.g. the files under 
{{package/}} are not available).

 

-In general, the only approach that seems to work for the Kubernetes Executor 
is to specify a worker image with all DAG dependencies pre-loaded, which 
largely voids the usefulness of a single DAGs PVC that can be dynamically 
updated.  At best, one can include a {{requirements.txt}} in the PVC and use it 
in tandem with an entry-point script built into the image, but that still 
doesn't help with source installations of custom packages stored and updated in 
a PVC.-

Edit: This isn't even possible, because worker pods are created using [the 
command field instead of args|#notes]!]

 

A quick fix for this situation is to allow one to specify the DAGs PVC mount 
point.  With this option, one can mount the PVC anywhere and specify an Airflow 
DAGs location that works in conjunction with the mount point (e.g. mount the 
PVC at {{/airflow/package}} and independently set the Airflow DAGs location to 
{{/airflow/package/dags}}).  This option would — in many cases — obviate the 
need for the marginally useful {{kubernetes.dags_volume_subpath}} options, as 
well.


> Add a DAGs PVC Mount Point Option for Workers under Kubernetes Executor
> -----------------------------------------------------------------------
>
>                 Key: AIRFLOW-6778
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6778
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: executor-kubernetes, worker
>    Affects Versions: 1.10.6, 1.10.7, 1.10.8, 1.10.9
>            Reporter: Brandon Willard
>            Assignee: Brandon Willard
>            Priority: Blocker
>              Labels: kubernetes, options
>
> The worker pods generated by the Kubernetes Executor force the DAGs PVC to be 
> mounted at the Airflow DAGs folder.  This, combined with a general inability 
> to specify arbitrary PVCs on workers (see AIRFLOW-3126 and the 
> linked/duplicated issues), severely constrains the usability of worker pods 
> and the Kubernetes Executor as a whole.
>  
> For example, if a DAGs-containing PVC is rooted at a Python package (e.g. 
> {{package/}}) that needs to be installed on each worker (e.g. DAGs in 
> {{package/dags/}}, package install point at {{package/setup.py}}, and Airflow 
> DAGs location {{/airflow/dags}}), then the current static mount point logic 
> will only allow a worker to directly mount the entire package into the 
> Airflow DAGs location  —  while the actual DAGs are in a subdirectory — or 
> exclusively mount the package's sub-path {{package/dags}} (using the existing 
> {{kubernetes.dags_volume_subpath}} config option).  While the latter is at 
> least correct, it completely foregoes the required parent directory and it 
> makes the requisite package unavailable for installation (e.g. the files 
> under {{package/}} are not available).
>  
> -In general, the only approach that seems to work for the Kubernetes Executor 
> is to specify a worker image with all DAG dependencies pre-loaded, which 
> largely voids the usefulness of a single DAGs PVC that can be dynamically 
> updated.  At best, one can include a {{requirements.txt}} in the PVC and use 
> it in tandem with an entry-point script built into the image, but that still 
> doesn't help with source installations of custom packages stored and updated 
> in a PVC.-
> Edit: This isn't even possible, because worker pods are created using [the 
> {{command}} field instead of 
> {{args}}|https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/#notes]!
>  
> A quick fix for this situation is to allow one to specify the DAGs PVC mount 
> point.  With this option, one can mount the PVC anywhere and specify an 
> Airflow DAGs location that works in conjunction with the mount point (e.g. 
> mount the PVC at {{/airflow/package}} and independently set the Airflow DAGs 
> location to {{/airflow/package/dags}}).  This option would — in many cases — 
> obviate the need for the marginally useful {{kubernetes.dags_volume_subpath}} 
> options, as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to