4406arthur opened a new issue #11744:
URL: https://github.com/apache/airflow/issues/11744
<!--
Welcome to Apache Airflow! For a smooth issue process, try to answer the
following questions.
Don't worry if they're not all applicable; just try to include what you can
:-)
If you need to include code snippets or logs, please put them in fenced code
blocks. If they're super-long, please use the details tag like
<details><summary>super-long log</summary> lots of stuff </details>
Please delete these comment blocks before submitting the issue.
-->
<!--
IMPORTANT!!!
PLEASE CHECK "SIMILAR TO X EXISTING ISSUES" OPTION IF VISIBLE
NEXT TO "SUBMIT NEW ISSUE" BUTTON!!!
PLEASE CHECK IF THIS ISSUE HAS BEEN REPORTED PREVIOUSLY USING SEARCH!!!
Please complete the next sections or the issue will be closed.
These questions are the first thing we need to know to understand the
context.
-->
**Apache Airflow version**: v1.10.12
**Kubernetes version** : v1.18.2
**Environment**:
- **Cloud provider or hardware configuration**: on-premise
- **OS** (e.g. from /etc/os-release): Red Hat 7.8
- **Kernel** (e.g. `uname **-a`):** 3.10
- **Install tools**: helm stable/airflow chart version 7.13.0
**What happened**:
The dags pod describe details as below, init container did not clone the
repo, when I logs init container just show a git usage guide
```yaml
Name: avmpodetlrealnesetl-994b62e3a5c94be8a6c69d466064fa6e
Namespace: airflow
Priority: 0
Node: mlaas-k8s-worker-1/10.240.245.75
Start Time: Thu, 22 Oct 2020 05:21:08 -0400
Labels: airflow-worker=a55d639e-8c38-40ae-9851-b9710c60b2fd
airflow_version=1.10.12
dag_id=avm_pod_etl
execution_date=2020-10-22T09_20_55.288604_plus_00_00
kubernetes_executor=True
task_id=real_nes_etl
try_number=1
Annotations: <none>
Status: Failed
IP: 10.233.103.87
IPs:
IP: 10.233.103.87
Init Containers:
git-sync-clone:
Container ID:
docker://5fa243b6c27f6734835e78612026d17d9a62454e35c51cf1fb9db4ff664994b7
Image: private-harbor:8080/library/git:latest
Image ID:
docker-pullable://private-harbor:8080/library/git@sha256:18d268a6d938f513040674b38d6ea2484d2384aa6904cb8d9a96f7a5e8304ca7
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 22 Oct 2020 05:21:12 -0400
Finished: Thu, 22 Oct 2020 05:21:12 -0400
Ready: True
Restart Count: 0
Environment:
GIT_SYNC_REPO: ssh://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
GIT_SYNC_BRANCH: master
GIT_SYNC_ROOT: /git
GIT_SYNC_DEST: repo
GIT_SYNC_DEPTH: 1
GIT_SYNC_ONE_TIME: true
GIT_SYNC_REV:
GIT_SSH_KEY_FILE: /etc/git-secret/ssh
GIT_SYNC_ADD_USER: true
GIT_SYNC_SSH: true
GIT_KNOWN_HOSTS: false
Mounts:
/etc/git-secret/ssh from git-sync-ssh-key (rw,path="ssh")
/git from airflow-dags (rw)
/var/run/secrets/kubernetes.io/serviceaccount from airflow-token-9qssw
(ro)
Containers:
base:
Container ID:
docker://9f23dea94ba7aad0674f54e3b35969d84b084e26942aa14da886a8a43646c039
Image: private-harbor:8080/library/airflow:1.10.12-python3.6
Image ID:
docker-pullable://private-harbor:8080/library/airflow@sha256:9ea9e5ca66bd17632241889ab248fe3852c9f3c830ed299a8ecaa8a13ac2082f
Port: <none>
Host Port: <none>
Command:
airflow
run
avm_pod_etl
real_nes_etl
2020-10-22T09:20:55.288604+00:00
--local
--pool
default_pool
-sd
/opt/airflow/dags/avm_dag.py
State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 22 Oct 2020 05:21:15 -0400
Finished: Thu, 22 Oct 2020 05:21:27 -0400
Ready: False
Restart Count: 0
Environment Variables from:
airflow-env ConfigMap Optional: false
Environment:
AIRFLOW__CORE__DAGS_FOLDER: /opt/airflow/dags/repo/
AIRFLOW__CORE__EXECUTOR: LocalExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN:
postgresql+psycopg2://airflow:[email protected]:5432/airflow
Mounts:
/opt/airflow/dags from airflow-dags (ro)
/opt/airflow/logs from airflow-logs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from airflow-token-9qssw
(ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
airflow-dags:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
airflow-logs:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
git-sync-ssh-key:
Type: Secret (a volume populated by a Secret)
SecretName: airflow-secret
Optional: false
airflow-token-9qssw:
Type: Secret (a volume populated by a Secret)
SecretName: airflow-token-9qssw
Optional: false
QoS Class: BestEffort
```
**What you expected to happen**:
The init git container lost the command and args parts.
**How to reproduce it**:
<!---
As minimally and precisely as possible. Keep in mind we do not have access
to your cluster or dags.
If you are using kubernetes, please attempt to recreate the issue using
minikube or kind.
## Install minikube/kind
- Minikube https://minikube.sigs.k8s.io/docs/start/
- Kind https://kind.sigs.k8s.io/docs/user/quick-start/
If this is a UI bug, please provide a screenshot of the bug or a link to a
youtube video of the bug in action
You can include images using the .md style of

To record a screencast, mac users can use QuickTime and then create an
unlisted youtube video with the resulting .mov file.
--->
I share my config, maybe just a config issue
```
###################################
# Airflow - Common Configs
###################################
airflow:
## configs for the docker image of the web/scheduler/worker
image:
repository: private-harbor:8080/library/airflow
tag: 1.10.12-python3.6
## values: Always or IfNotPresent
pullPolicy: IfNotPresent
pullSecret: ""
executor: KubernetesExecutor
fernetKey: "7T512UXSSmBOkpWimFHIVb8jK6lfmSAvx4mO6Arehnc="
config:
AIRFLOW__WEBSERVER__ENABLE_PROXY_FIX: "True"
AIRFLOW__CORE__LOAD_EXAMPLES: "True"
AIRFLOW__KUBERNETES__WORKER_CONTAINER_REPOSITORY:
private-harbor:8080/library/airflow
AIRFLOW__KUBERNETES__WORKER_CONTAINER_TAG: 1.10.12-python3.6
AIRFLOW__KUBERNETES__GIT_SYNC_CONTAINER_REPOSITORY:
private-harbor:8080/library/git
AIRFLOW__KUBERNETES__GIT_SYNC_CONTAINER_TAG: latest
AIRFLOW__KUBERNETES__GIT_REPO: "ssh://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
AIRFLOW__KUBERNETES__GIT_BRANCH: "master"
AIRFLOW__KUBERNETES__GIT_SSH_KEY_SECRET_NAME: "airflow-secret"
AIRFLOW__KUBERNETES__GIT_DAGS_FOLDER_MOUNT_POINT: "/opt/airflow/dags"
AIRFLOW__KUBERNETES__RUN_AS_USER: "50000"
AIRFLOW__KUBERNETES__DELETE_WORKER_PODS: "False"
AIRFLOW__KUBERNETES__DAGS_IN_IMAGE: "False"
###################################
# Airflow - Scheduler Configs
###################################
scheduler:
resources: {}
## the nodeSelector configs for the scheduler Pods
##
nodeSelector: {}
## the affinity configs for the scheduler Pods
##
affinity: {}
## the toleration configs for the scheduler Pods
##
tolerations: []
## the security context for the scheduler Pods
##
securityContext: {}
## labels for the scheduler Deployment
##
labels: {}
## Pod labels for the scheduler Deployment
##
podLabels: {}
## annotations for the scheduler Deployment
##
annotations: {}
## Pod Annotations for the scheduler Deployment
##
podAnnotations: {}
## if we should tell Kubernetes Autoscaler that its safe to evict these
Pods
##
safeToEvict: true
## configs for the PodDisruptionBudget of the scheduler
##
podDisruptionBudget:
## if a PodDisruptionBudget resource is created for the scheduler
##
enabled: true
## the maximum unavailable pods/percentage for the scheduler
##
## NOTE:
## - as there is only ever a single scheduler Pod,
## this must be 100% for Kubernetes to be able to migrate it
##
maxUnavailable: "100%"
## the minimum available pods/percentage for the scheduler
##
minAvailable: ""
connections: []
## if `scheduler.connections` are deleted and re-added after each
scheduler restart
##
refreshConnections: true
## custom airflow variables for the airflow scheduler
##
## NOTE:
## - THIS IS A STRING, containing a JSON object, with your variables in it
##
## EXAMPLE:
## variables: |
## { "environment": "dev" }
##
variables: |
{}
## custom airflow pools for the airflow scheduler
##
## NOTE:
## - THIS IS A STRING, containing a JSON object, with your pools in it
##
## EXAMPLE:
## pools: |
## {
## "example": {
## "description": "This is an example pool with 2 slots.",
## "slots": 2
## }
## }
##
pools: |
{}
## the value of the `airflow --num_runs` parameter used to run the airflow
scheduler
##
## NOTE:
## - this is the number of 'dag refreshes' before the airflow scheduler
process will exit
## - if not set to `-1`, the scheduler Pod will restart regularly
## - for most environments, `-1` will be an acceptable value
##
numRuns: -1
## if we run `airflow initdb` when the scheduler starts
##
initdb: true
## if we run `airflow initdb` inside a special initContainer
##
## NOTE:
## - may be needed if you have custom database hooks configured that will
be pulled in by git-sync
##
preinitdb: false
## the number of seconds to wait (in bash) before starting the scheduler
container
##
initialStartupDelay: 0
livenessProbe:
enabled: true
initialDelaySeconds: 300
periodSeconds: 30
failureThreshold: 5
###################################
# Airflow - Worker Configs
###################################
workers:
## if the airflow workers StatefulSet should be deployed
##
enabled: false
###################################
# Airflow - Flower Configs
###################################
flower:
## if the Flower UI should be deployed
##
## NOTE:
## - only takes effect if `airflow.executor` is `CeleryExecutor`
##
enabled: false
###################################
# Airflow - Logs Configs
###################################
logs:
## the airflow logs folder
##
path: /opt/airflow/logs
## configs for the logs PVC
##
persistence:
## if a persistent volume is mounted at `logs.path`
##
enabled: false
## the name of an existing PVC to use
##
existingClaim: ""
## sub-path under `logs.persistence.existingClaim` to use
##
subPath: ""
## the name of the StorageClass used by the PVC
##
## NOTE:
## - if set to "", then `PersistentVolumeClaim/spec.storageClassName` is
omitted
## - if set to "-", then `PersistentVolumeClaim/spec.storageClassName`
is set to ""
##
storageClass: ""
## the access mode of the PVC
##
## WARNING:
## - must be: `ReadWriteMany`
##
## NOTE:
## - different StorageClass support different access modes:
##
https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes
##
accessMode: ReadWriteMany
## the size of PVC to request
##
size: 1Gi
###################################
# Airflow - DAGs Configs
###################################
dags:
## the airflow dags folder
##
path: /opt/airflow/dags
## whether to disable pickling dags from the scheduler to workers
##
## NOTE:
## - sets AIRFLOW__CORE__DONOT_PICKLE
##
doNotPickle: false
## install any Python `requirements.txt` at the root of `dags.path`
automatically
##
## WARNING:
## - if set to true, and you are using `dags.git.gitSync`, you must also
enable
## `dags.initContainer` to ensure the requirements.txt is available at Pod
start
##
installRequirements: false
## configs for the dags PVC
##
persistence:
## if a persistent volume is mounted at `dags.path`
##
enabled: false
## the name of an existing PVC to use
##
existingClaim: ""
## sub-path under `dags.persistence.existingClaim` to use
##
subPath: ""
## the name of the StorageClass used by the PVC
##
## NOTE:
## - if set to "", then `PersistentVolumeClaim/spec.storageClassName` is
omitted
## - if set to "-", then `PersistentVolumeClaim/spec.storageClassName`
is set to ""
##
storageClass: ""
## the access mode of the PVC
##
## WARNING:
## - must be one of: `ReadOnlyMany` or `ReadWriteMany`
##
## NOTE:
## - different StorageClass support different access modes:
##
https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes
##
accessMode: ReadOnlyMany
## the size of PVC to request
##
size: 1Gi
## configs for the DAG git repository & sync container
##
git:
## url of the git repository
##
## EXAMPLE: (HTTP)
## url: "https://github.com/torvalds/linux.git"
##
## EXAMPLE: (SSH)
## url: "ssh://[email protected]:torvalds/linux.git"
##
url: "ssh://xxxxxxxxxxxxxxxxxxxxxxxx"
## the branch/tag/sha1 which we clone
##
ref: master
## the name of a pre-created secret containing files for ~/.ssh/
##
## NOTE:
## - this is ONLY RELEVANT for SSH git repos
## - the secret commonly includes files: id_rsa, id_rsa.pub, known_hosts
## - known_hosts is NOT NEEDED if `git.sshKeyscan` is true
##
secret: "airflow-git-keys"
## if we should implicitly trust [git.repoHost]:git.repoPort, by auto
creating a ~/.ssh/known_hosts
##
## WARNING:
## - setting true will increase your vulnerability ot a repo spoofing
attack
##
## NOTE:
## - this is ONLY RELEVANT for SSH git repos
## - this is not needed if known_hosts is provided in `git.secret`
## - git.repoHost and git.repoPort ARE REQUIRED for this to work
##
sshKeyscan: false
## the name of the private key file in your `git.secret`
##
## NOTE:
## - this is ONLY RELEVANT for PRIVATE SSH git repos
##
privateKeyName: id_rsa
## the host name of the git repo
##
## NOTE:
## - this is ONLY REQUIRED for SSH git repos
##
## EXAMPLE:
## repoHost: "github.com"
##
repoHost: "10.240.245.11"
## the port of the git repo
##
## NOTE:
## - this is ONLY REQUIRED for SSH git repos
##
repoPort: 22
## configs for the git-sync container
##
gitSync:
## enable the git-sync sidecar container
##
enabled: true
## resource requests/limits for the git-sync container
##
## NOTE:
## - when `workers.autoscaling` is true, YOU MUST SPECIFY a resource
request
##
## EXAMPLE:
## resources:
## requests:
## cpu: "50m"
## memory: "64Mi"
##
resources: {}
## the docker image for the git-sync container
image:
repository: private-harbor:8080/library/git
tag: latest
## values: Always or IfNotPresent
pullPolicy: Always
## the git sync interval in seconds
##
refreshTime: 60
## configs for the git-clone container
##
## NOTE:
## - use this container if you want to only clone the external git repo
## at Pod start-time, and not keep it synchronised afterwards
##
initContainer:
## enable the git-clone sidecar container
##
## NOTE:
## - this is NOT required for the git-sync sidecar to work
## - this is mostly used for when `dags.installRequirements` is true to
ensure that
## requirements.txt is available at Pod start
##
enabled: false
## resource requests/limits for the git-clone container
##
## EXAMPLE:
## resources:
## requests:
## cpu: "50m"
## memory: "64Mi"
##
resources: {}
## the docker image for the git-clone container
image:
repository: private-harbor:8080/library/git
tag: latest
## values: Always or IfNotPresent
pullPolicy: Always
## path to mount dags-data volume to
##
## WARNING:
## - this path is also used by the git-sync container
##
mountPath: "/dags"
## sub-path under `dags.initContainer.mountPath` to sync dags to
##
## WARNING:
## - this path is also used by the git-sync container
## - this MUST INCLUDE the leading /
##
## EXAMPLE:
## syncSubPath: "/subdirWithDags"
##
syncSubPath: ""
###################################
# Kubernetes - RBAC
###################################
rbac:
## if Kubernetes RBAC resources are created
##
## NOTE:
## - these allow the service account to create/delete Pods in the airflow
namespace,
## which is required for the KubernetesPodOperator() to function
##
create: true
## if the created RBAC Role has GET/LIST on Event resources
##
## NOTE:
## - this is needed for KubernetesPodOperator() to use
`log_events_on_failure=True`
##
events: false
###################################
# Kubernetes - Service Account
###################################
serviceAccount:
## if a Kubernetes ServiceAccount is created
##
## NOTE:
## - if false, you must create the service account outside of this helm
chart,
## with the name: `serviceAccount.name`
##
create: true
## the name of the ServiceAccount
##
## NOTE:
## - by default the name is generated using the
`airflow.serviceAccountName` template in `_helpers.tpl`
##
name: ""
## annotations for the ServiceAccount
##
## EXAMPLE: (to use WorkloadIdentity in Google Cloud)
## annotations:
## iam.gke.io/gcp-service-account:
<<GCP_SERVICE>>@<<GCP_PROJECT>>.iam.gserviceaccount.com
##
annotations: {}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]