This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/master by this push:
     new 4d2a787  Enables Kerberos sidecar support (#11130)
4d2a787 is described below

commit 4d2a7870704385db492081b41119c12a51445897
Author: Jarek Potiuk <[email protected]>
AuthorDate: Mon Sep 28 00:13:36 2020 +0200

    Enables Kerberos sidecar support (#11130)
    
    Some of the users of Airflow are using Kerberos to authenticate
    their worker workflows. Airflow has a basic support for Kerberos
    for some of the operators and it has support to refresh the
    temporary Kerberos tokens via `airflow kerberos` command.
    
    This change adds support for the Kerberos side-car that connects
    to the Kerberos Key Distribution Center and retrieves the
    token using Keytab that should be deployed as Kubernetes Secret.
    
    It uses shared volume to share the temporary token. The nice
    thing about setting it up as a sidecar is that the Keytab
    is never shared with the workers - the secret is only mounted
    by the sidecar and the workers have only access to the temporary
    token.
    
    Depends on #11129
---
 breeze                                         |  2 +-
 chart/README.md                                | 17 +++++--
 chart/templates/_helpers.yaml                  | 12 +++++
 chart/templates/configmap.yaml                 |  4 ++
 chart/templates/workers/worker-deployment.yaml | 61 +++++++++++++++++++++++++
 chart/tests/git-sync-worker_test.yaml          |  6 +++
 chart/values.yaml                              | 63 ++++++++++++++++++++++++++
 docs/production-deployment.rst                 | 27 +++++++++++
 docs/security/kerberos.rst                     |  2 +
 9 files changed, 189 insertions(+), 5 deletions(-)

diff --git a/breeze b/breeze
index c8174af..a2b96f6 100755
--- a/breeze
+++ b/breeze
@@ -3011,7 +3011,7 @@ function breeze::run_breeze_command() {
 #      3. last used version stored in ./build/PYTHON_MAJOR_MINOR_VERSION
 #      4. DEFAULT_PYTHON_MAJOR_MINOR_VERSION from 
scripts/ci/libraries/_initialization.sh
 #
-# Here points 2. and 3. are realised. If result is empty string , the 4. will 
be set in
+# Here points 2. and 3. are realized. If result is empty string , the 4. will 
be set in
 #      the next step (sanity_checks::basic_sanity_checks() is called and the 
version is still not set by then)
 #      finally, if  --python flag is specified, it will override whatever is 
set above.
 #
diff --git a/chart/README.md b/chart/README.md
index 0443c63..a17ef37 100644
--- a/chart/README.md
+++ b/chart/README.md
@@ -131,7 +131,7 @@ The following tables lists the configurable parameters of 
the Airflow chart and
 | `ingress.flower.*`                                    | Configs for the 
Ingress of the flower Service                                                   
             | Please refer to `values.yaml`                     |
 | `networkPolicies.enabled`                             | Enable Network 
Policies to restrict traffic                                                    
              | `true`                                            |
 | `airflowHome`                                         | Location of airflow 
home directory                                                                  
         | `/opt/airflow`                                    |
-| `rbacEnabled`                                         | Deploy pods with 
Kubernets RBAC enabled                                                          
            | `true`                                            |
+| `rbacEnabled`                                         | Deploy pods with 
Kubernetes RBAC enabled                                                         
            | `true`                                            |
 | `executor`                                            | Airflow executor (eg 
SequentialExecutor, LocalExecutor, CeleryExecutor, KubernetesExecutor)          
        | `KubernetesExecutor`                              |
 | `allowPodLaunching`                                   | Allow airflow pods 
to talk to Kubernetes API to launch more pods                                   
          | `true`                                            |
 | `defaultAirflowRepository`                            | Fallback docker 
repository to pull airflow image from                                           
             | `apache/airflow`                                  |
@@ -160,13 +160,22 @@ The following tables lists the configurable parameters of 
the Airflow chart and
 | `data.resultBackendSecretName`                        | Secret name to mount 
Celery result backend connection string from                                    
        | `~`                                               |
 | `data.metadataConection`                              | Field separated 
connection data (alternative to secret name)                                    
             | `{}`                                              |
 | `data.resultBackendConnection`                        | Field separated 
connection data (alternative to secret name)                                    
             | `{}`                                              |
-| `fernetKey`                                           | String representing 
an Airflow fernet key                                                           
         | `~`                                               |
-| `fernetKeySecretName`                                 | Secret name for 
Airlow fernet key                                                               
             | `~`                                               |
+| `fernetKey`                                           | String representing 
an Airflow Fernet key                                                           
         | `~`                                               |
+| `fernetKeySecretName`                                 | Secret name for 
Airflow Fernet key                                                              
             | `~`                                               |
+| `kerberos.enabled`                                    | Enable kerberos 
support for workers                                                             
             | `false`                                           |
+| `kerberos.ccacheMountPath`                            | Location of the 
ccache volume                                                                   
             | `/var/kerberos-ccache`                            |
+| `kerberos.ccacheFileName`                             | Name of the ccache 
file                                                                            
          | `ccache`                                          |
+| `kerberos.configPath`                                 | Path for the 
Kerberos config file                                                            
                | `/etc/krb5.conf`                                  |
+| `kerberos.keytabPath`                                 | Path for the 
Kerberos keytab file                                                            
                | `/etc/airflow.keytab`                             |
+| `kerberos.principal`                                  | Name of the Kerberos 
principal                                                                       
        | `airflow`                                         |
+| `kerberos.reinitFrequency`                            | Frequency of 
reinitialization of the Kerberos token                                          
                | `3600`                                            |
+| `kerberos.confg`                                      | Content of the 
configuration file for kerberos (might be templated using Helm templates)       
              | `<see values.yaml>`                               |
 | `workers.replicas`                                    | Replica count for 
Celery workers (if applicable)                                                  
           | `1`                                               |
 | `workers.keda.enabled`                                | Enable KEDA 
autoscaling features                                                            
                 | `false`                                           |
 | `workers.keda.pollingInverval`                        | How often KEDA 
should poll the backend database for metrics in seconds                         
              | `5`                                               |
 | `workers.keda.cooldownPeriod`                         | How often KEDA 
should wait before scaling down in seconds                                      
              | `30`                                              |
 | `workers.keda.maxReplicaCount`                        | Maximum number of 
Celery workers KEDA can scale to                                                
           | `10`                                              |
+| `workers.kerberosSideCar.enabled`                     | Enable Kerberos 
sidecar for the worker                                                          
             | `false`                                           |
 | `workers.persistence.enabled`                         | Enable log 
persistence in workers via StatefulSet                                          
                  | `false`                                           |
 | `workers.persistence.size`                            | Size of worker 
volumes if enabled                                                              
              | `100Gi`                                           |
 | `workers.persistence.storageClassName`                | StorageClass worker 
volumes should use if enabled                                                   
         | `default`                                         |
@@ -199,7 +208,7 @@ The following tables lists the configurable parameters of 
the Airflow chart and
 | `webserver.resources.requests.memory`                 | Memory Request of 
webserver                                                                       
           | `~`                                               |
 | `webserver.service.annotations`                       | Annotations to be 
added to the webserver service                                                  
           | `{}`                                              |
 | `webserver.defaultUser`                               | Optional default 
airflow user information                                                        
            | `{}`                                              |
-| `dags.persistence.*`                                  | Dag persistence 
configutation                                                                   
             | Please refer to `values.yaml`                     |
+| `dags.persistence.*`                                  | Dag persistence 
configuration                                                                   
             | Please refer to `values.yaml`                     |
 | `dags.gitSync.*`                                      | Git sync 
configuration                                                                   
                    | Please refer to `values.yaml`                     |
 
 
diff --git a/chart/templates/_helpers.yaml b/chart/templates/_helpers.yaml
index 0b9695e..ef62493 100644
--- a/chart/templates/_helpers.yaml
+++ b/chart/templates/_helpers.yaml
@@ -235,6 +235,14 @@
 {{ default (printf "%s-elasticsearch" .Release.Name) 
.Values.elasticsearch.secretName }}
 {{- end }}
 
+{{ define "kerberos_keytab_secret" -}}
+{{ .Release.Name }}-kerberos-keytab
+{{- end }}
+
+{{ define "kerberos_ccache_path" -}}
+{{ printf "%s/%s" .Values.kerberos.ccacheMountPath 
.Values.kerberos.ccacheFileName }}
+{{- end }}
+
 {{ define "pgbouncer_config" }}
 {{- $pgMetadataHost := .Values.data.metadataConnection.host | default (printf 
"%s-%s.%s.svc.cluster.local" .Release.Name "postgresql" .Release.Namespace) }}
 {{- $pgResultBackendHost := .Values.data.resultBackendConnection.host | 
default (printf "%s-%s.%s.svc.cluster.local" .Release.Name "postgresql" 
.Release.Namespace) }}
@@ -265,6 +273,10 @@ log_connections = {{ .Values.pgbouncer.logConnections }}
 {{ (printf "%s/logs" .Values.airflowHome) | quote }}
 {{- end }}
 
+{{ define "airflow_logs_no_quote" -}}
+{{ (printf "%s/logs" .Values.airflowHome) }}
+{{- end }}
+
 {{ define "airflow_dags" -}}
 {{- if .Values.dags.gitSync.enabled -}}
 {{ (printf "%s/dags/%s/%s" .Values.airflowHome .Values.dags.gitSync.dest 
.Values.dags.gitSync.subPath ) }}
diff --git a/chart/templates/configmap.yaml b/chart/templates/configmap.yaml
index f78f883..495c4db 100644
--- a/chart/templates/configmap.yaml
+++ b/chart/templates/configmap.yaml
@@ -62,4 +62,8 @@ data:
 {{- else }}
 {{ tpl (.Files.Get "files/pod-template-file.yaml") . | nindent 4 }}
 {{- end }}
+{{- if .Values.kerberos.enabled }}
+  krb5.conf: |
+    {{ tpl .Values.kerberos.config . | nindent 4 }}
+{{- end }}
 {{- end }}
diff --git a/chart/templates/workers/worker-deployment.yaml 
b/chart/templates/workers/worker-deployment.yaml
index f963326..34c3292 100644
--- a/chart/templates/workers/worker-deployment.yaml
+++ b/chart/templates/workers/worker-deployment.yaml
@@ -124,6 +124,15 @@ spec:
               mountPath: {{ template "airflow_config_path" . }}
               subPath: airflow.cfg
               readOnly: true
+            {{- if .Values.workers.kerberosSidecar.enabled }}
+            - name: config
+              mountPath: {{ .Values.kerberos.configPath | quote }}
+              subPath: krb5.conf
+              readOnly: true
+            - name: kerberos-ccache
+              mountPath: {{ .Values.kerberos.ccacheMountPath | quote }}
+              readOnly: true
+            {{- end }}
 {{- if .Values.scheduler.airflowLocalSettings }}
             - name: config
               mountPath: {{ template "airflow_local_setting_path" . }}
@@ -145,10 +154,62 @@ spec:
             - name: logs
               mountPath: {{ template "airflow_logs" . }}
 {{- end }}
+        {{- if .Values.workers.kerberosSidecar.enabled }}
+            - name: KRB5_CONFIG
+              value:  {{ .Values.kerberos.configPath | quote }}
+            - name: KRB5CCNAME
+              value:  {{ include "kerberos_ccache_path" . | quote }}
+        {{- end }}
+        {{- if .Values.workers.kerberosSidecar.enabled }}
+        - name: worker-kerberos
+          image: {{ template "airflow_image" . }}
+          imagePullPolicy: {{ .Values.images.airflow.pullPolicy }}
+          args: ["kerberos"]
+          resources:
+          {{ toYaml .Values.workers.resources | indent 12 }}
+          volumeMounts:
+            - name: logs
+              mountPath: {{ template "airflow_logs" . }}
+            - name: config
+              mountPath: {{ template "airflow_config_path" . }}
+              subPath: airflow.cfg
+              readOnly: true
+            - name: config
+              mountPath: {{ .Values.kerberos.configPath | quote }}
+              subPath: krb5.conf
+              readOnly: true
+            {{- if .Values.scheduler.airflowLocalSettings }}
+            - name: config
+              mountPath: {{ template "airflow_local_setting_path" . }}
+              subPath: airflow_local_settings.py
+              readOnly: true
+            {{- end }}
+            - name: kerberos-keytab
+              subPath: "kerberos.keytab"
+              mountPath: {{ .Values.kerberos.keytabPath | quote }}
+              readOnly: true
+            - name: kerberos-ccache
+              mountPath: {{ .Values.kerberos.ccacheMountPath | quote }}
+              readOnly: false
+          env:
+            - name: KRB5_CONFIG
+              value:  {{ .Values.kerberos.configPath | quote }}
+            - name: KRB5CCNAME
+              value:  {{ include "kerberos_ccache_path" . | quote }}
+          {{- include "custom_airflow_environment" . | indent 10 }}
+          {{- include "standard_airflow_environment" . | indent 10 }}
+        {{- end }}
       volumes:
+        - name: kerberos-keytab
+          secret:
+            secretName: {{ include "kerberos_keytab_secret" . | quote }}
         - name: config
           configMap:
             name: {{ template "airflow_config" . }}
+        {{- if .Values.kerberos.enabled }}
+        - name: kerberos-ccache
+          emptyDir: {}
+        {{- end }}
         {{- if .Values.dags.persistence.enabled }}
         - name: dags
           persistentVolumeClaim:
diff --git a/chart/tests/git-sync-worker_test.yaml 
b/chart/tests/git-sync-worker_test.yaml
index 847a4dc..f9e2286 100644
--- a/chart/tests/git-sync-worker_test.yaml
+++ b/chart/tests/git-sync-worker_test.yaml
@@ -29,6 +29,9 @@ tests:
     asserts:
       - equal:
           path: spec.template.spec.volumes[1].name
+          value: config
+      - equal:
+          path: spec.template.spec.volumes[2].name
           value: dags
   - it: should add dags volume to the worker if git sync is enabled & 
peristence is disabled
     set:
@@ -41,6 +44,9 @@ tests:
     asserts:
       - equal:
           path: spec.template.spec.volumes[1].name
+          value: config
+      - equal:
+          path: spec.template.spec.volumes[2].name
           value: dags
   - it: should add git sync container to worker if persistence is not enabled, 
but git sync is
     set:
diff --git a/chart/values.yaml b/chart/values.yaml
index 1d582c7..a9a457a 100644
--- a/chart/values.yaml
+++ b/chart/values.yaml
@@ -181,6 +181,59 @@ data:
 fernetKey: ~
 fernetKeySecretName: ~
 
+
+# In order to use kerberos you need to create secret containing the keytab file
+# The secret name should follow naming convention of the application where 
resources are
+# name {{ .Release-name }}-<POSTFIX>. In case of the keytab file, the postfix 
is "kerberos-keytab"
+# So if your release is named "my-release" the name of the secret should be 
"my-release-kerberos-keytab"
+#
+# The Keytab content should be available in the "kerberos.keytab" key of the 
secret.
+#
+#  apiVersion: v1
+#  kind: Secret
+#  data:
+#    kerberos.keytab: <base64_encoded keytab file content>
+#  type: Opaque
+#
+#
+#  If you have such keytab file you can do it with similar
+#
+#  kubectl create secret generic {{ .Release.name }}-kerberos-keytab 
--from-file=kerberos.keytab
+#
+kerberos:
+  enabled: false
+  ccacheMountPath: '/var/kerberos-ccache'
+  ccacheFileName: 'cache'
+  configPath: '/etc/krb5.conf'
+  keytabPath: '/etc/airflow.keytab'
+  principal: '[email protected]'
+  reinitFrequency: 3600
+  config: |
+    # This is an example config showing how you can use templating and how 
"example" config
+    # might look like. It works with the test kerberos server that we are 
using during integration
+    # testing at Apache Airflow (see 
`scripts/ci/docker-compose/integration-kerberos.yml` but in
+    # order to make it production-ready you must replace it with your own 
configuration that
+    # Matches your kerberos deployment. Administrators of your Kerberos 
instance should
+    # provide the right configuration.
+
+    [logging]
+    default = "FILE:{{ template "airflow_logs_no_quote" . }}/kerberos_libs.log"
+    kdc = "FILE:{{ template "airflow_logs_no_quote" . }}/kerberos_kdc.log"
+    admin_server = "FILE:{{ template "airflow_logs_no_quote" . }}/kadmind.log"
+
+    [libdefaults]
+    default_realm = FOO.COM
+    ticket_lifetime = 10h
+    renew_lifetime = 7d
+    forwardable = true
+
+    [realms]
+    FOO.COM = {
+      kdc = kdc-server.foo.com
+      admin_server = admin_server.foo.com
+    }
+
+
 # Airflow Worker Config
 workers:
   # Number of airflow celery workers in StatefulSet
@@ -214,6 +267,10 @@ workers:
     # of local-path provisioner.
     fixPermissions: false
 
+  kerberosSidecar:
+    # Enable kerberos sidecar
+    enabled: false
+
   resources: {}
   #  limits:
   #   cpu: 100m
@@ -537,6 +594,12 @@ config:
     timeout: 30
     retry_timeout: 'True'
 
+  kerberos:
+    keytab: '{{ .Values.kerberos.keytabPath }}'
+    reinit_frequency: '{{ .Values.kerberos.reinitFrequency }}'
+    principal: '{{ .Values.kerberos.principal }}'
+    ccache: '{{ .Values.kerberos.ccacheMountPath }}/{{ 
.Values.kerberos.ccacheFileName }}'
+
   kubernetes:
     namespace: '{{ .Release.Namespace }}'
     airflow_configmap: '{{ include "airflow_config" . }}'
diff --git a/docs/production-deployment.rst b/docs/production-deployment.rst
index fc0b9bb..450899a 100644
--- a/docs/production-deployment.rst
+++ b/docs/production-deployment.rst
@@ -467,3 +467,30 @@ More details about the images
 
 You can read more details about the images - the context, their parameters and 
internal structure in the
 `IMAGES.rst <https://github.com/apache/airflow/blob/master/IMAGES.rst>`_ 
document.
+
+.. _production-deployment:kerberos:
+
+Kerberos-authenticated workers
+==============================
+
+Apache Airflow has a built-in mechanism for authenticating the operation with 
a KDC (Key Distribution Center).
+Airflow has a separate command ``airflow kerberos`` that acts as token 
refresher. It uses the pre-configured
+Kerberos Keytab to authenticate in the KDC to obtain a valid token, and then 
refreshing valid token
+at regular intervals within the current token expiry window.
+
+Each request for refresh uses a configured principal, and only keytab valid 
for the principal specified
+is capable of retrieving the authentication token.
+
+The best practice to implement proper security mechanism in this case is to 
make sure that worker
+workloads have no access to the Keytab but only have access to the 
periodically refreshed, temporary
+authentication tokens. This can be achieved in docker environment by running 
the ``airflow kerberos``
+command and the worker command in separate containers - where only the 
``airflow kerberos`` token has
+access to the Keytab file (preferably configured as secret resource). Those 
two containers should share
+a volume where the temporary token should be written by the ``airflow 
kerberos`` and read by the workers.
+
+In the Kubernetes environment, this can be realized by the concept of 
side-car, where both Kerberos
+token refresher and worker are part of the same Pod. Only the Kerberos 
side-car has access to
+Keytab secret and both containers in the same Pod share the volume, where 
temporary token is written by
+the side-care container and read by the worker container.
+
+This concept is implemented in the development version of the Helm Chart that 
is part of Airflow source code.
diff --git a/docs/security/kerberos.rst b/docs/security/kerberos.rst
index e9cf3b0..7867fe9 100644
--- a/docs/security/kerberos.rst
+++ b/docs/security/kerberos.rst
@@ -133,3 +133,5 @@ To use kerberos authentication, you must install Airflow 
with the ``kerberos`` e
 .. code-block:: bash
 
    pip install 'apache-airflow[kerberos]'
+
+You can read about some production aspects of kerberos deployment at 
:ref:`production-deployment:kerberos`

Reply via email to