This is an automated email from the ASF dual-hosted git repository.
potiuk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/airflow.git
The following commit(s) were added to refs/heads/master by this push:
new 4d2a787 Enables Kerberos sidecar support (#11130)
4d2a787 is described below
commit 4d2a7870704385db492081b41119c12a51445897
Author: Jarek Potiuk <[email protected]>
AuthorDate: Mon Sep 28 00:13:36 2020 +0200
Enables Kerberos sidecar support (#11130)
Some of the users of Airflow are using Kerberos to authenticate
their worker workflows. Airflow has a basic support for Kerberos
for some of the operators and it has support to refresh the
temporary Kerberos tokens via `airflow kerberos` command.
This change adds support for the Kerberos side-car that connects
to the Kerberos Key Distribution Center and retrieves the
token using Keytab that should be deployed as Kubernetes Secret.
It uses shared volume to share the temporary token. The nice
thing about setting it up as a sidecar is that the Keytab
is never shared with the workers - the secret is only mounted
by the sidecar and the workers have only access to the temporary
token.
Depends on #11129
---
breeze | 2 +-
chart/README.md | 17 +++++--
chart/templates/_helpers.yaml | 12 +++++
chart/templates/configmap.yaml | 4 ++
chart/templates/workers/worker-deployment.yaml | 61 +++++++++++++++++++++++++
chart/tests/git-sync-worker_test.yaml | 6 +++
chart/values.yaml | 63 ++++++++++++++++++++++++++
docs/production-deployment.rst | 27 +++++++++++
docs/security/kerberos.rst | 2 +
9 files changed, 189 insertions(+), 5 deletions(-)
diff --git a/breeze b/breeze
index c8174af..a2b96f6 100755
--- a/breeze
+++ b/breeze
@@ -3011,7 +3011,7 @@ function breeze::run_breeze_command() {
# 3. last used version stored in ./build/PYTHON_MAJOR_MINOR_VERSION
# 4. DEFAULT_PYTHON_MAJOR_MINOR_VERSION from
scripts/ci/libraries/_initialization.sh
#
-# Here points 2. and 3. are realised. If result is empty string , the 4. will
be set in
+# Here points 2. and 3. are realized. If result is empty string , the 4. will
be set in
# the next step (sanity_checks::basic_sanity_checks() is called and the
version is still not set by then)
# finally, if --python flag is specified, it will override whatever is
set above.
#
diff --git a/chart/README.md b/chart/README.md
index 0443c63..a17ef37 100644
--- a/chart/README.md
+++ b/chart/README.md
@@ -131,7 +131,7 @@ The following tables lists the configurable parameters of
the Airflow chart and
| `ingress.flower.*` | Configs for the
Ingress of the flower Service
| Please refer to `values.yaml` |
| `networkPolicies.enabled` | Enable Network
Policies to restrict traffic
| `true` |
| `airflowHome` | Location of airflow
home directory
| `/opt/airflow` |
-| `rbacEnabled` | Deploy pods with
Kubernets RBAC enabled
| `true` |
+| `rbacEnabled` | Deploy pods with
Kubernetes RBAC enabled
| `true` |
| `executor` | Airflow executor (eg
SequentialExecutor, LocalExecutor, CeleryExecutor, KubernetesExecutor)
| `KubernetesExecutor` |
| `allowPodLaunching` | Allow airflow pods
to talk to Kubernetes API to launch more pods
| `true` |
| `defaultAirflowRepository` | Fallback docker
repository to pull airflow image from
| `apache/airflow` |
@@ -160,13 +160,22 @@ The following tables lists the configurable parameters of
the Airflow chart and
| `data.resultBackendSecretName` | Secret name to mount
Celery result backend connection string from
| `~` |
| `data.metadataConection` | Field separated
connection data (alternative to secret name)
| `{}` |
| `data.resultBackendConnection` | Field separated
connection data (alternative to secret name)
| `{}` |
-| `fernetKey` | String representing
an Airflow fernet key
| `~` |
-| `fernetKeySecretName` | Secret name for
Airlow fernet key
| `~` |
+| `fernetKey` | String representing
an Airflow Fernet key
| `~` |
+| `fernetKeySecretName` | Secret name for
Airflow Fernet key
| `~` |
+| `kerberos.enabled` | Enable kerberos
support for workers
| `false` |
+| `kerberos.ccacheMountPath` | Location of the
ccache volume
| `/var/kerberos-ccache` |
+| `kerberos.ccacheFileName` | Name of the ccache
file
| `ccache` |
+| `kerberos.configPath` | Path for the
Kerberos config file
| `/etc/krb5.conf` |
+| `kerberos.keytabPath` | Path for the
Kerberos keytab file
| `/etc/airflow.keytab` |
+| `kerberos.principal` | Name of the Kerberos
principal
| `airflow` |
+| `kerberos.reinitFrequency` | Frequency of
reinitialization of the Kerberos token
| `3600` |
+| `kerberos.confg` | Content of the
configuration file for kerberos (might be templated using Helm templates)
| `<see values.yaml>` |
| `workers.replicas` | Replica count for
Celery workers (if applicable)
| `1` |
| `workers.keda.enabled` | Enable KEDA
autoscaling features
| `false` |
| `workers.keda.pollingInverval` | How often KEDA
should poll the backend database for metrics in seconds
| `5` |
| `workers.keda.cooldownPeriod` | How often KEDA
should wait before scaling down in seconds
| `30` |
| `workers.keda.maxReplicaCount` | Maximum number of
Celery workers KEDA can scale to
| `10` |
+| `workers.kerberosSideCar.enabled` | Enable Kerberos
sidecar for the worker
| `false` |
| `workers.persistence.enabled` | Enable log
persistence in workers via StatefulSet
| `false` |
| `workers.persistence.size` | Size of worker
volumes if enabled
| `100Gi` |
| `workers.persistence.storageClassName` | StorageClass worker
volumes should use if enabled
| `default` |
@@ -199,7 +208,7 @@ The following tables lists the configurable parameters of
the Airflow chart and
| `webserver.resources.requests.memory` | Memory Request of
webserver
| `~` |
| `webserver.service.annotations` | Annotations to be
added to the webserver service
| `{}` |
| `webserver.defaultUser` | Optional default
airflow user information
| `{}` |
-| `dags.persistence.*` | Dag persistence
configutation
| Please refer to `values.yaml` |
+| `dags.persistence.*` | Dag persistence
configuration
| Please refer to `values.yaml` |
| `dags.gitSync.*` | Git sync
configuration
| Please refer to `values.yaml` |
diff --git a/chart/templates/_helpers.yaml b/chart/templates/_helpers.yaml
index 0b9695e..ef62493 100644
--- a/chart/templates/_helpers.yaml
+++ b/chart/templates/_helpers.yaml
@@ -235,6 +235,14 @@
{{ default (printf "%s-elasticsearch" .Release.Name)
.Values.elasticsearch.secretName }}
{{- end }}
+{{ define "kerberos_keytab_secret" -}}
+{{ .Release.Name }}-kerberos-keytab
+{{- end }}
+
+{{ define "kerberos_ccache_path" -}}
+{{ printf "%s/%s" .Values.kerberos.ccacheMountPath
.Values.kerberos.ccacheFileName }}
+{{- end }}
+
{{ define "pgbouncer_config" }}
{{- $pgMetadataHost := .Values.data.metadataConnection.host | default (printf
"%s-%s.%s.svc.cluster.local" .Release.Name "postgresql" .Release.Namespace) }}
{{- $pgResultBackendHost := .Values.data.resultBackendConnection.host |
default (printf "%s-%s.%s.svc.cluster.local" .Release.Name "postgresql"
.Release.Namespace) }}
@@ -265,6 +273,10 @@ log_connections = {{ .Values.pgbouncer.logConnections }}
{{ (printf "%s/logs" .Values.airflowHome) | quote }}
{{- end }}
+{{ define "airflow_logs_no_quote" -}}
+{{ (printf "%s/logs" .Values.airflowHome) }}
+{{- end }}
+
{{ define "airflow_dags" -}}
{{- if .Values.dags.gitSync.enabled -}}
{{ (printf "%s/dags/%s/%s" .Values.airflowHome .Values.dags.gitSync.dest
.Values.dags.gitSync.subPath ) }}
diff --git a/chart/templates/configmap.yaml b/chart/templates/configmap.yaml
index f78f883..495c4db 100644
--- a/chart/templates/configmap.yaml
+++ b/chart/templates/configmap.yaml
@@ -62,4 +62,8 @@ data:
{{- else }}
{{ tpl (.Files.Get "files/pod-template-file.yaml") . | nindent 4 }}
{{- end }}
+{{- if .Values.kerberos.enabled }}
+ krb5.conf: |
+ {{ tpl .Values.kerberos.config . | nindent 4 }}
+{{- end }}
{{- end }}
diff --git a/chart/templates/workers/worker-deployment.yaml
b/chart/templates/workers/worker-deployment.yaml
index f963326..34c3292 100644
--- a/chart/templates/workers/worker-deployment.yaml
+++ b/chart/templates/workers/worker-deployment.yaml
@@ -124,6 +124,15 @@ spec:
mountPath: {{ template "airflow_config_path" . }}
subPath: airflow.cfg
readOnly: true
+ {{- if .Values.workers.kerberosSidecar.enabled }}
+ - name: config
+ mountPath: {{ .Values.kerberos.configPath | quote }}
+ subPath: krb5.conf
+ readOnly: true
+ - name: kerberos-ccache
+ mountPath: {{ .Values.kerberos.ccacheMountPath | quote }}
+ readOnly: true
+ {{- end }}
{{- if .Values.scheduler.airflowLocalSettings }}
- name: config
mountPath: {{ template "airflow_local_setting_path" . }}
@@ -145,10 +154,62 @@ spec:
- name: logs
mountPath: {{ template "airflow_logs" . }}
{{- end }}
+ {{- if .Values.workers.kerberosSidecar.enabled }}
+ - name: KRB5_CONFIG
+ value: {{ .Values.kerberos.configPath | quote }}
+ - name: KRB5CCNAME
+ value: {{ include "kerberos_ccache_path" . | quote }}
+ {{- end }}
+ {{- if .Values.workers.kerberosSidecar.enabled }}
+ - name: worker-kerberos
+ image: {{ template "airflow_image" . }}
+ imagePullPolicy: {{ .Values.images.airflow.pullPolicy }}
+ args: ["kerberos"]
+ resources:
+ {{ toYaml .Values.workers.resources | indent 12 }}
+ volumeMounts:
+ - name: logs
+ mountPath: {{ template "airflow_logs" . }}
+ - name: config
+ mountPath: {{ template "airflow_config_path" . }}
+ subPath: airflow.cfg
+ readOnly: true
+ - name: config
+ mountPath: {{ .Values.kerberos.configPath | quote }}
+ subPath: krb5.conf
+ readOnly: true
+ {{- if .Values.scheduler.airflowLocalSettings }}
+ - name: config
+ mountPath: {{ template "airflow_local_setting_path" . }}
+ subPath: airflow_local_settings.py
+ readOnly: true
+ {{- end }}
+ - name: kerberos-keytab
+ subPath: "kerberos.keytab"
+ mountPath: {{ .Values.kerberos.keytabPath | quote }}
+ readOnly: true
+ - name: kerberos-ccache
+ mountPath: {{ .Values.kerberos.ccacheMountPath | quote }}
+ readOnly: false
+ env:
+ - name: KRB5_CONFIG
+ value: {{ .Values.kerberos.configPath | quote }}
+ - name: KRB5CCNAME
+ value: {{ include "kerberos_ccache_path" . | quote }}
+ {{- include "custom_airflow_environment" . | indent 10 }}
+ {{- include "standard_airflow_environment" . | indent 10 }}
+ {{- end }}
volumes:
+ - name: kerberos-keytab
+ secret:
+ secretName: {{ include "kerberos_keytab_secret" . | quote }}
- name: config
configMap:
name: {{ template "airflow_config" . }}
+ {{- if .Values.kerberos.enabled }}
+ - name: kerberos-ccache
+ emptyDir: {}
+ {{- end }}
{{- if .Values.dags.persistence.enabled }}
- name: dags
persistentVolumeClaim:
diff --git a/chart/tests/git-sync-worker_test.yaml
b/chart/tests/git-sync-worker_test.yaml
index 847a4dc..f9e2286 100644
--- a/chart/tests/git-sync-worker_test.yaml
+++ b/chart/tests/git-sync-worker_test.yaml
@@ -29,6 +29,9 @@ tests:
asserts:
- equal:
path: spec.template.spec.volumes[1].name
+ value: config
+ - equal:
+ path: spec.template.spec.volumes[2].name
value: dags
- it: should add dags volume to the worker if git sync is enabled &
peristence is disabled
set:
@@ -41,6 +44,9 @@ tests:
asserts:
- equal:
path: spec.template.spec.volumes[1].name
+ value: config
+ - equal:
+ path: spec.template.spec.volumes[2].name
value: dags
- it: should add git sync container to worker if persistence is not enabled,
but git sync is
set:
diff --git a/chart/values.yaml b/chart/values.yaml
index 1d582c7..a9a457a 100644
--- a/chart/values.yaml
+++ b/chart/values.yaml
@@ -181,6 +181,59 @@ data:
fernetKey: ~
fernetKeySecretName: ~
+
+# In order to use kerberos you need to create secret containing the keytab file
+# The secret name should follow naming convention of the application where
resources are
+# name {{ .Release-name }}-<POSTFIX>. In case of the keytab file, the postfix
is "kerberos-keytab"
+# So if your release is named "my-release" the name of the secret should be
"my-release-kerberos-keytab"
+#
+# The Keytab content should be available in the "kerberos.keytab" key of the
secret.
+#
+# apiVersion: v1
+# kind: Secret
+# data:
+# kerberos.keytab: <base64_encoded keytab file content>
+# type: Opaque
+#
+#
+# If you have such keytab file you can do it with similar
+#
+# kubectl create secret generic {{ .Release.name }}-kerberos-keytab
--from-file=kerberos.keytab
+#
+kerberos:
+ enabled: false
+ ccacheMountPath: '/var/kerberos-ccache'
+ ccacheFileName: 'cache'
+ configPath: '/etc/krb5.conf'
+ keytabPath: '/etc/airflow.keytab'
+ principal: '[email protected]'
+ reinitFrequency: 3600
+ config: |
+ # This is an example config showing how you can use templating and how
"example" config
+ # might look like. It works with the test kerberos server that we are
using during integration
+ # testing at Apache Airflow (see
`scripts/ci/docker-compose/integration-kerberos.yml` but in
+ # order to make it production-ready you must replace it with your own
configuration that
+ # Matches your kerberos deployment. Administrators of your Kerberos
instance should
+ # provide the right configuration.
+
+ [logging]
+ default = "FILE:{{ template "airflow_logs_no_quote" . }}/kerberos_libs.log"
+ kdc = "FILE:{{ template "airflow_logs_no_quote" . }}/kerberos_kdc.log"
+ admin_server = "FILE:{{ template "airflow_logs_no_quote" . }}/kadmind.log"
+
+ [libdefaults]
+ default_realm = FOO.COM
+ ticket_lifetime = 10h
+ renew_lifetime = 7d
+ forwardable = true
+
+ [realms]
+ FOO.COM = {
+ kdc = kdc-server.foo.com
+ admin_server = admin_server.foo.com
+ }
+
+
# Airflow Worker Config
workers:
# Number of airflow celery workers in StatefulSet
@@ -214,6 +267,10 @@ workers:
# of local-path provisioner.
fixPermissions: false
+ kerberosSidecar:
+ # Enable kerberos sidecar
+ enabled: false
+
resources: {}
# limits:
# cpu: 100m
@@ -537,6 +594,12 @@ config:
timeout: 30
retry_timeout: 'True'
+ kerberos:
+ keytab: '{{ .Values.kerberos.keytabPath }}'
+ reinit_frequency: '{{ .Values.kerberos.reinitFrequency }}'
+ principal: '{{ .Values.kerberos.principal }}'
+ ccache: '{{ .Values.kerberos.ccacheMountPath }}/{{
.Values.kerberos.ccacheFileName }}'
+
kubernetes:
namespace: '{{ .Release.Namespace }}'
airflow_configmap: '{{ include "airflow_config" . }}'
diff --git a/docs/production-deployment.rst b/docs/production-deployment.rst
index fc0b9bb..450899a 100644
--- a/docs/production-deployment.rst
+++ b/docs/production-deployment.rst
@@ -467,3 +467,30 @@ More details about the images
You can read more details about the images - the context, their parameters and
internal structure in the
`IMAGES.rst <https://github.com/apache/airflow/blob/master/IMAGES.rst>`_
document.
+
+.. _production-deployment:kerberos:
+
+Kerberos-authenticated workers
+==============================
+
+Apache Airflow has a built-in mechanism for authenticating the operation with
a KDC (Key Distribution Center).
+Airflow has a separate command ``airflow kerberos`` that acts as token
refresher. It uses the pre-configured
+Kerberos Keytab to authenticate in the KDC to obtain a valid token, and then
refreshing valid token
+at regular intervals within the current token expiry window.
+
+Each request for refresh uses a configured principal, and only keytab valid
for the principal specified
+is capable of retrieving the authentication token.
+
+The best practice to implement proper security mechanism in this case is to
make sure that worker
+workloads have no access to the Keytab but only have access to the
periodically refreshed, temporary
+authentication tokens. This can be achieved in docker environment by running
the ``airflow kerberos``
+command and the worker command in separate containers - where only the
``airflow kerberos`` token has
+access to the Keytab file (preferably configured as secret resource). Those
two containers should share
+a volume where the temporary token should be written by the ``airflow
kerberos`` and read by the workers.
+
+In the Kubernetes environment, this can be realized by the concept of
side-car, where both Kerberos
+token refresher and worker are part of the same Pod. Only the Kerberos
side-car has access to
+Keytab secret and both containers in the same Pod share the volume, where
temporary token is written by
+the side-care container and read by the worker container.
+
+This concept is implemented in the development version of the Helm Chart that
is part of Airflow source code.
diff --git a/docs/security/kerberos.rst b/docs/security/kerberos.rst
index e9cf3b0..7867fe9 100644
--- a/docs/security/kerberos.rst
+++ b/docs/security/kerberos.rst
@@ -133,3 +133,5 @@ To use kerberos authentication, you must install Airflow
with the ``kerberos`` e
.. code-block:: bash
pip install 'apache-airflow[kerberos]'
+
+You can read about some production aspects of kerberos deployment at
:ref:`production-deployment:kerberos`