This is an automated email from the ASF dual-hosted git repository. potiuk pushed a commit to branch v1-10-test in repository https://gitbox.apache.org/repos/asf/airflow.git
commit 96f2b6bb11fa2bfb72c51335235d4e0834d2fc03 Author: Jarek Potiuk <jarek.pot...@polidea.com> AuthorDate: Mon Sep 28 00:13:36 2020 +0200 Enables Kerberos sidecar support (#11130) Some of the users of Airflow are using Kerberos to authenticate their worker workflows. Airflow has a basic support for Kerberos for some of the operators and it has support to refresh the temporary Kerberos tokens via `airflow kerberos` command. This change adds support for the Kerberos side-car that connects to the Kerberos Key Distribution Center and retrieves the token using Keytab that should be deployed as Kubernetes Secret. It uses shared volume to share the temporary token. The nice thing about setting it up as a sidecar is that the Keytab is never shared with the workers - the secret is only mounted by the sidecar and the workers have only access to the temporary token. Depends on #11129 (cherry picked from commit 4d2a7870704385db492081b41119c12a51445897) --- breeze | 2 +- chart/README.md | 22 ++++++--- chart/templates/_helpers.yaml | 12 +++++ chart/templates/configmap.yaml | 4 ++ chart/templates/workers/worker-deployment.yaml | 61 +++++++++++++++++++++++++ chart/values.yaml | 63 ++++++++++++++++++++++++++ 6 files changed, 156 insertions(+), 8 deletions(-) diff --git a/breeze b/breeze index 175a4ab..ff5d7cb 100755 --- a/breeze +++ b/breeze @@ -3073,7 +3073,7 @@ function breeze::run_breeze_command() { # 3. last used version stored in ./build/PYTHON_MAJOR_MINOR_VERSION # 4. DEFAULT_PYTHON_MAJOR_MINOR_VERSION from scripts/ci/libraries/_initialization.sh # -# Here points 2. and 3. are realised. If result is empty string , the 4. will be set in +# Here points 2. and 3. are realized. If result is empty string , the 4. will be set in # the next step (sanity_checks::basic_sanity_checks() is called and the version is still not set by then) # finally, if --python flag is specified, it will override whatever is set above. # diff --git a/chart/README.md b/chart/README.md index 11dc632..8372bb4 100644 --- a/chart/README.md +++ b/chart/README.md @@ -74,8 +74,7 @@ helm upgrade airflow . \ --set images.airflow.tag=8a0da78 ``` -For local development purppose you can also u -You can also build the image locally and use it via deployment method described by Breeze. +For local development purpose you can also build the image locally and use it via deployment method described by Breeze. ## Mounting DAGS using Git-Sync side car with Persistence enabled @@ -129,7 +128,7 @@ The following tables lists the configurable parameters of the Airflow chart and | `privateRegistry.repository` | Repository where base image lives (eg: quay.io) | `~` | | `networkPolicies.enabled` | Enable Network Policies to restrict traffic | `true` | | `airflowHome` | Location of airflow home directory | `/opt/airflow` | -| `rbacEnabled` | Deploy pods with Kubernets RBAC enabled | `true` | +| `rbacEnabled` | Deploy pods with Kubernetes RBAC enabled | `true` | | `executor` | Airflow executor (eg SequentialExecutor, LocalExecutor, CeleryExecutor, KubernetesExecutor) | `KubernetesExecutor` | | `allowPodLaunching` | Allow airflow pods to talk to Kubernetes API to launch more pods | `true` | | `defaultAirflowRepository` | Fallback docker repository to pull airflow image from | `apache/airflow` | @@ -158,13 +157,22 @@ The following tables lists the configurable parameters of the Airflow chart and | `data.resultBackendSecretName` | Secret name to mount Celery result backend connection string from | `~` | | `data.metadataConection` | Field separated connection data (alternative to secret name) | `{}` | | `data.resultBackendConnection` | Field separated connection data (alternative to secret name) | `{}` | -| `fernetKey` | String representing an Airflow fernet key | `~` | -| `fernetKeySecretName` | Secret name for Airlow fernet key | `~` | +| `fernetKey` | String representing an Airflow Fernet key | `~` | +| `fernetKeySecretName` | Secret name for Airflow Fernet key | `~` | +| `kerberos.enabled` | Enable kerberos support for workers | `false` | +| `kerberos.ccacheMountPath` | Location of the ccache volume | `/var/kerberos-ccache` | +| `kerberos.ccacheFileName` | Name of the ccache file | `ccache` | +| `kerberos.configPath` | Path for the Kerberos config file | `/etc/krb5.conf` | +| `kerberos.keytabPath` | Path for the Kerberos keytab file | `/etc/airflow.keytab` | +| `kerberos.principal` | Name of the Kerberos principal | `airflow` | +| `kerberos.reinitFrequency` | Frequency of reinitialization of the Kerberos token | `3600` | +| `kerberos.confg` | Content of the configuration file for kerberos (might be templated using Helm templates) | `<see values.yaml>` | | `workers.replicas` | Replica count for Celery workers (if applicable) | `1` | | `workers.keda.enabled` | Enable KEDA autoscaling features | `false` | | `workers.keda.pollingInverval` | How often KEDA should poll the backend database for metrics in seconds | `5` | | `workers.keda.cooldownPeriod` | How often KEDA should wait before scaling down in seconds | `30` | | `workers.keda.maxReplicaCount` | Maximum number of Celery workers KEDA can scale to | `10` | +| `workers.kerberosSideCar.enabled` | Enable Kerberos sidecar for the worker | `false` | | `workers.persistence.enabled` | Enable log persistence in workers via StatefulSet | `false` | | `workers.persistence.size` | Size of worker volumes if enabled | `100Gi` | | `workers.persistence.storageClassName` | StorageClass worker volumes should use if enabled | `default` | @@ -196,8 +204,8 @@ The following tables lists the configurable parameters of the Airflow chart and | `webserver.resources.requests.cpu` | CPU Request of webserver | `~` | | `webserver.resources.requests.memory` | Memory Request of webserver | `~` | | `webserver.defaultUser` | Optional default airflow user information | `{}` | -| `dags.persistence.*` | Dag persistence configutation | Please refer to `values.yaml` | -| `dags.gitSync.*` | Git sync configuration | Please refer to `values.yaml` | +| `dags.persistence.*` | Dag persistence configuration | Please refer to `values.yaml` | +| `dags.gitSync.*` | Git sync configuration | Please refer to `values.yaml` | Specify each parameter using the `--set key=value[,key=value]` argument to `helm install`. For example, diff --git a/chart/templates/_helpers.yaml b/chart/templates/_helpers.yaml index 195d484..5d3ae73 100644 --- a/chart/templates/_helpers.yaml +++ b/chart/templates/_helpers.yaml @@ -235,6 +235,14 @@ {{ default (printf "%s-elasticsearch" .Release.Name) .Values.elasticsearch.secretName }} {{- end }} +{{ define "kerberos_keytab_secret" -}} +{{ .Release.Name }}-kerberos-keytab +{{- end }} + +{{ define "kerberos_ccache_path" -}} +{{ printf "%s/%s" .Values.kerberos.ccacheMountPath .Values.kerberos.ccacheFileName }} +{{- end }} + {{ define "pgbouncer_config" }} {{- $pgMetadataHost := .Values.data.metadataConnection.host | default (printf "%s-%s.%s.svc.cluster.local" .Release.Name "postgresql" .Release.Namespace) }} {{- $pgResultBackendHost := .Values.data.resultBackendConnection.host | default (printf "%s-%s.%s.svc.cluster.local" .Release.Name "postgresql" .Release.Namespace) }} @@ -265,6 +273,10 @@ log_connections = {{ .Values.pgbouncer.logConnections }} {{ (printf "%s/logs" .Values.airflowHome) | quote }} {{- end }} +{{ define "airflow_logs_no_quote" -}} +{{ (printf "%s/logs" .Values.airflowHome) }} +{{- end }} + {{ define "airflow_dags" -}} {{- if .Values.dags.gitSync.enabled -}} {{ (printf "%s/dags/%s/%s" .Values.airflowHome .Values.dags.gitSync.dest .Values.dags.gitSync.subPath ) }} diff --git a/chart/templates/configmap.yaml b/chart/templates/configmap.yaml index d5b4b08..b5bc656 100644 --- a/chart/templates/configmap.yaml +++ b/chart/templates/configmap.yaml @@ -62,4 +62,8 @@ data: {{- else }} {{ tpl (.Files.Get "files/pod-template-file.kubernetes-helm-yaml") . | nindent 4 }} {{- end }} +{{- if .Values.kerberos.enabled }} + krb5.conf: | + {{ tpl .Values.kerberos.config . | nindent 4 }} +{{- end }} {{- end }} diff --git a/chart/templates/workers/worker-deployment.yaml b/chart/templates/workers/worker-deployment.yaml index 23d2255..fe07e20 100644 --- a/chart/templates/workers/worker-deployment.yaml +++ b/chart/templates/workers/worker-deployment.yaml @@ -124,6 +124,15 @@ spec: mountPath: {{ template "airflow_config_path" . }} subPath: airflow.cfg readOnly: true + {{- if .Values.workers.kerberosSidecar.enabled }} + - name: config + mountPath: {{ .Values.kerberos.configPath | quote }} + subPath: krb5.conf + readOnly: true + - name: kerberos-ccache + mountPath: {{ .Values.kerberos.ccacheMountPath | quote }} + readOnly: true + {{- end }} {{- if .Values.scheduler.airflowLocalSettings }} - name: config mountPath: {{ template "airflow_local_setting_path" . }} @@ -145,10 +154,62 @@ spec: - name: logs mountPath: {{ template "airflow_logs" . }} {{- end }} + {{- if .Values.workers.kerberosSidecar.enabled }} + - name: KRB5_CONFIG + value: {{ .Values.kerberos.configPath | quote }} + - name: KRB5CCNAME + value: {{ include "kerberos_ccache_path" . | quote }} + {{- end }} + {{- if .Values.workers.kerberosSidecar.enabled }} + - name: worker-kerberos + image: {{ template "airflow_image" . }} + imagePullPolicy: {{ .Values.images.airflow.pullPolicy }} + args: ["kerberos"] + resources: + {{ toYaml .Values.workers.resources | indent 12 }} + volumeMounts: + - name: logs + mountPath: {{ template "airflow_logs" . }} + - name: config + mountPath: {{ template "airflow_config_path" . }} + subPath: airflow.cfg + readOnly: true + - name: config + mountPath: {{ .Values.kerberos.configPath | quote }} + subPath: krb5.conf + readOnly: true + {{- if .Values.scheduler.airflowLocalSettings }} + - name: config + mountPath: {{ template "airflow_local_setting_path" . }} + subPath: airflow_local_settings.py + readOnly: true + {{- end }} + - name: kerberos-keytab + subPath: "kerberos.keytab" + mountPath: {{ .Values.kerberos.keytabPath | quote }} + readOnly: true + - name: kerberos-ccache + mountPath: {{ .Values.kerberos.ccacheMountPath | quote }} + readOnly: false + env: + - name: KRB5_CONFIG + value: {{ .Values.kerberos.configPath | quote }} + - name: KRB5CCNAME + value: {{ include "kerberos_ccache_path" . | quote }} + {{- include "custom_airflow_environment" . | indent 10 }} + {{- include "standard_airflow_environment" . | indent 10 }} + {{- end }} volumes: + - name: kerberos-keytab + secret: + secretName: {{ include "kerberos_keytab_secret" . | quote }} - name: config configMap: name: {{ template "airflow_config" . }} + {{- if .Values.kerberos.enabled }} + - name: kerberos-ccache + emptyDir: {} + {{- end }} {{- if .Values.dags.persistence.enabled }} - name: dags persistentVolumeClaim: diff --git a/chart/values.yaml b/chart/values.yaml index c0b9ff5..513dc47 100644 --- a/chart/values.yaml +++ b/chart/values.yaml @@ -128,6 +128,59 @@ data: fernetKey: ~ fernetKeySecretName: ~ + +# In order to use kerberos you need to create secret containing the keytab file +# The secret name should follow naming convention of the application where resources are +# name {{ .Release-name }}-<POSTFIX>. In case of the keytab file, the postfix is "kerberos-keytab" +# So if your release is named "my-release" the name of the secret should be "my-release-kerberos-keytab" +# +# The Keytab content should be available in the "kerberos.keytab" key of the secret. +# +# apiVersion: v1 +# kind: Secret +# data: +# kerberos.keytab: <base64_encoded keytab file content> +# type: Opaque +# +# +# If you have such keytab file you can do it with similar +# +# kubectl create secret generic {{ .Release.name }}-kerberos-keytab --from-file=kerberos.keytab +# +kerberos: + enabled: false + ccacheMountPath: '/var/kerberos-ccache' + ccacheFileName: 'cache' + configPath: '/etc/krb5.conf' + keytabPath: '/etc/airflow.keytab' + principal: 'airf...@foo.com' + reinitFrequency: 3600 + config: | + # This is an example config showing how you can use templating and how "example" config + # might look like. It works with the test kerberos server that we are using during integration + # testing at Apache Airflow (see `scripts/ci/docker-compose/integration-kerberos.yml` but in + # order to make it production-ready you must replace it with your own configuration that + # Matches your kerberos deployment. Administrators of your Kerberos instance should + # provide the right configuration. + + [logging] + default = "FILE:{{ template "airflow_logs_no_quote" . }}/kerberos_libs.log" + kdc = "FILE:{{ template "airflow_logs_no_quote" . }}/kerberos_kdc.log" + admin_server = "FILE:{{ template "airflow_logs_no_quote" . }}/kadmind.log" + + [libdefaults] + default_realm = FOO.COM + ticket_lifetime = 10h + renew_lifetime = 7d + forwardable = true + + [realms] + FOO.COM = { + kdc = kdc-server.foo.com + admin_server = admin_server.foo.com + } + + # Airflow Worker Config workers: # Number of airflow celery workers in StatefulSet @@ -161,6 +214,10 @@ workers: # of local-path provisioner. fixPermissions: false + kerberosSidecar: + # Enable kerberos sidecar + enabled: false + resources: {} # limits: # cpu: 100m @@ -495,6 +552,12 @@ config: timeout: 30 retry_timeout: 'True' + kerberos: + keytab: '{{ .Values.kerberos.keytabPath }}' + reinit_frequency: '{{ .Values.kerberos.reinitFrequency }}' + principal: '{{ .Values.kerberos.principal }}' + ccache: '{{ .Values.kerberos.ccacheMountPath }}/{{ .Values.kerberos.ccacheFileName }}' + kubernetes: namespace: '{{ .Release.Namespace }}' airflow_configmap: '{{ include "airflow_config" . }}'