[GitHub] [airflow] fengsi commented on pull request #8150: [AIRFLOW-3126] Add option to specify additional K8s volumes

GitBox Wed, 29 Jul 2020 00:27:24 -0700


fengsi commented on pull request #8150:
URL: https://github.com/apache/airflow/pull/8150#issuecomment-664664791



   I followed the [dumping YAML 
hack](https://github.com/apache/airflow/pull/8150#issuecomment-663355935) 
@xEviL mentioned and it worked.  The Helm template (based on 
[`stable/airflow`](https://github.com/helm/charts/tree/master/stable/airflow) 
chart) I ended up using is:
   ```
   apiVersion: v1
   kind: ConfigMap
   metadata:
     name: {{ include "airflow.fullname" . }}-files
     labels:
       app: {{ include "airflow.labels.app" . }}
       chart: {{ include "airflow.labels.chart" . }}
       release: {{ .Release.Name }}
       heritage: {{ .Release.Service }}
   data:
     kubernetes-executor-template.yaml: |
       apiVersion: v1
       kind: Pod
       metadata:
         name: {{ include "airflow.fullname" . }}-worker
       spec:
         containers:
           - name: {{ .Chart.Name }}-worker
             image: {{ .Values.airflow.airflow.image.repository }}:{{ 
.Values.airflow.airflow.image.tag }}
             imagePullPolicy: {{ .Values.airflow.airflow.image.pullPolicy}}
             envFrom:
               - configMapRef:
                   name: "{{ include "airflow.fullname" . }}-env"
             env:
               - name: AIRFLOW__CORE__EXECUTOR
                 value: LocalExecutor
               {{- if .Values.airflow.airflow.extraEnv }}
               {{- toYaml .Values.airflow.airflow.extraEnv | nindent 12 }}
               {{- end }}
             {{- if .Values.airflow.airflow.extraVolumeMounts }}
             volumeMounts:
             {{- toYaml .Values.airflow.airflow.extraVolumeMounts | nindent 12 
}}
             {{- end }}
         {{- if .Values.airflow.airflow.extraVolumes }}
         volumes:
         {{- toYaml .Values.airflow.airflow.extraVolumes | nindent 8 }}
         {{- end }}
   ```
   The `.Values.airflow.airflow` not `.Values.airflow` is because I used 
`stable/airflow` as a dependency of my own chart.
   The catches here are:
   1. I needed to specify both `metadata.name` and `spec.containers[*].name`.  
Though they will be overridden by other logic later, I cannot create the 
"reference Pod" with them omitted;
   2. I needed to hard-code `AIRFLOW__CORE__EXECUTOR=LocalExecutor` in the 
template for the `KubernetesExecutor` **Pod**, otherwise it will pick 
`AIRFLOW__CORE__EXECUTOR=KubernetesExecutor` from global settings and fail – 
that's what I mean by "partially contaminated by `KubernetesExecutor` settings";
   3. As you can see, I also made it to pick up `extraEnv`, `extraVolumeMounts` 
and `extraVolumes` from global settings.  This allows mounting correct stuff.
   This approach I took was partially due to the way the 
[`stable/airflow`](https://github.com/helm/charts/tree/master/stable/airflow) 
chart was implemented.
   
   ### What I learned about `pod_template_file` so far:
   Well, it worked, at least.
   **BUT**, unfortunately, it's more of a hack than a solution.  It's like 
reverse engineering a scratch Pod file: create an object based on the template 
(some parameters are expected but are really just placeholders), hack on the 
fly in Airflow `KubernetesExecutor` logic to make the real Pod object, and 
convert it to config again with modified values.  The problems I can see:
   1. There is really no way for a user to know exactly what the final outcome 
would be, unless diving into every single line of the source code and hack the 
hell out of if;
   2. This "two-way" binding approach is too dynamic, sometimes scary;
   3. With `pod_template_file`, tons of `AIRFLOW__KUBERNETES__*` settings are 
now split into two groups: one for those "meant for scheduler when launching 
Pods but don't really apply to any individual Pod", and one for those that 
"should be passed down to Pods individually".  I don't see a clear boundary 
between these two use cases, and of course for the latter one, a fixed template 
cannot do much, and it may still require more hacks like the 
`pod_mutation_hook` or `executor_config` mentioned by @mik-laj.  Guess this is 
an example of the "two-way" binding pattern we should probably avoid in the 
future.
   
   As for the right direction for next steps, my use case is somewhat covered 
by above hacks.  But I think rather than rushing into any particular approach, 
we may need more time gathering K8s use cases from more users (plus the K8s 
ecosystem is evolving fast too).
   And it may make more sense to start with a new executor when the right time 
comes, if we were lucky to find an elegant brand new solution.  We can then 
keep current `KubernetesExecutor` as is and don't try to make it more 
complicated than what it should have been.  Just my 2 cents. :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] fengsi commented on pull request #8150: [AIRFLOW-3126] Add option to specify additional K8s volumes

Reply via email to