karakanb opened a new issue, #33066: URL: https://github.com/apache/airflow/issues/33066
### Apache Airflow version 2.6.3 ### What happened I regularly see these logs in my scheduler logs every 10 minutes: ``` [2023-08-03T11:54:27.257+0000] {kubernetes_executor.py:114} ERROR - Unknown error in KubernetesJobWatcher. Failing2023-08-03 14:54:27.266 Traceback (most recent call last): File "/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 761, in _update_chunk_length self.chunk_left = int(line, 16) ValueError: invalid literal for int() with base 16: b'' 2023-08-03 14:54:27.266 During handling of the above exception, another exception occurred: 2023-08-03 14:54:27.266 Traceback (most recent call last): File "/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 444, in _error_catcher yield File "/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 828, in read_chunked self._update_chunk_length() File "/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 765, in _update_chunk_length raise InvalidChunkLength(self, line) urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b'', 0 bytes read) 2023-08-03 14:54:27.266 During handling of the above exception, another exception occurred: 2023-08-03 14:54:27.266 Traceback (most recent call last): File "/home/airflow/.local/lib/python3.10/site-packages/airflow/executors/kubernetes_executor.py", line 105, in run self.resource_version = self._run( File "/home/airflow/.local/lib/python3.10/site-packages/airflow/executors/kubernetes_executor.py", line 161, in _run for event in self._pod_events(kube_client=kube_client, query_kwargs=kwargs): File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/watch/watch.py", line 165, in stream for line in iter_resp_lines(resp): File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/watch/watch.py", line 56, in iter_resp_lines for seg in resp.stream(amt=None, decode_content=False): File "/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 624, in stream for line in self.read_chunked(amt, decode_content=decode_content): File "/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 816, in read_chunked with self._error_catcher(): File "/usr/local/lib/python3.10/contextlib.py", line 153, in __exit__ self.gen.throw(typ, value, traceback) File "/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 461, in _error_catcher raise ProtocolError("Connection broken: %r" % e, e) urllib3.exceptions.ProtocolError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)) Process KubernetesJobWatcher-3: Traceback (most recent call last): File "/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 761, in _update_chunk_length self.chunk_left = int(line, 16) ValueError: invalid literal for int() with base 16: b'' 2023-08-03 14:54:27.272 During handling of the above exception, another exception occurred: 2023-08-03 14:54:27.272 Traceback (most recent call last): File "/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 444, in _error_catcher yield File "/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 828, in read_chunked self._update_chunk_length() File "/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 765, in _update_chunk_length raise InvalidChunkLength(self, line) urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b'', 0 bytes read) 2023-08-03 14:54:27.272 During handling of the above exception, another exception occurred: 2023-08-03 14:54:27.272 Traceback (most recent call last): File "/usr/local/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/airflow/.local/lib/python3.10/site-packages/airflow/executors/kubernetes_executor.py", line 105, in run self.resource_version = self._run( File "/home/airflow/.local/lib/python3.10/site-packages/airflow/executors/kubernetes_executor.py", line 161, in _run for event in self._pod_events(kube_client=kube_client, query_kwargs=kwargs): File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/watch/watch.py", line 165, in stream for line in iter_resp_lines(resp): File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/watch/watch.py", line 56, in iter_resp_lines for seg in resp.stream(amt=None, decode_content=False): File "/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 624, in stream for line in self.read_chunked(amt, decode_content=decode_content): File "/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 816, in read_chunked with self._error_catcher(): File "/usr/local/lib/python3.10/contextlib.py", line 153, in __exit__ self.gen.throw(typ, value, traceback) File "/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 461, in _error_catcher raise ProtocolError("Connection broken: %r" % e, e) urllib3.exceptions.ProtocolError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)) [2023-08-03T11:54:27.869+0000] {kubernetes_executor.py:335} ERROR - Error while health checking kube watcher process for namespace airflow. Process died for unknown reasons ``` Not sure about the implications of this but I see these logs every time I need to investigate things, which makes it harder to debug issues. In the best case this is not really an issue and makes debugging hard, in the worst case it causes some issue that I haven't been able to identify yet. ### What you think should happen instead There should be no such log, this seems like an unexpected behavior. ### How to reproduce Deploy the official helm chart v1.9.0 using the following values file: ```yaml extraVolumeMounts: &gitsync_volume_mounts - name: shard1-data mountPath: /gitsync-client-repos extraVolumes: &gitsync_volumes - name: shard1-data persistentVolumeClaim: claimName: shard1-data # User and group of airflow user uid: 50000 gid: 0 # Detailed default security context for airflow deployments securityContexts: pod: {} containers: {} # Airflow home directory # Used for mount paths airflowHome: /opt/airflow # Airflow version (Used to make some decisions based on Airflow Version being deployed) airflowVersion: "2.6.3" # Images images: airflow: repository: registry.gitlab.com/org/repo tag: "2.6.3" pullPolicy: IfNotPresent pod_template: # Note that `images.pod_template.repository` and `images.pod_template.tag` parameters # can be overridden in `config.kubernetes` section. So for these parameters to have effect # `config.kubernetes.worker_container_repository` and `config.kubernetes.worker_container_tag` # must be not set . repository: ~ tag: ~ pullPolicy: IfNotPresent flower: repository: ~ tag: ~ pullPolicy: IfNotPresent statsd: repository: quay.io/prometheus/statsd-exporter tag: v0.22.8 pullPolicy: IfNotPresent redis: repository: redis tag: 7-bullseye pullPolicy: IfNotPresent gitSync: repository: registry.k8s.io/git-sync/git-sync tag: v3.6.3 pullPolicy: IfNotPresent # Ingress configuration ingress: # Configs for the Ingress of the web Service web: # Enable web ingress resource enabled: true annotations: nginx.ingress.kubernetes.io/affinity: cookie hosts: - name: "airflow.mycompany.com" ingressClassName: "nginx" flower: enabled: false executor: "CeleryKubernetesExecutor" allowPodLaunching: true env: - name: AIRFLOW__CORE__SECURE_MODE value: "True" - name: AIRFLOW__CORE__PARALLELISM value: "25" - name: AIRFLOW__CORE__MAX_ACTIVE_TASKS_PER_DAG value: "12" - name: AIRFLOW__CORE__MAX_ACTIVE_RUNS_PER_DAG value: "1" - name: AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT value: "60.0" - name: AIRFLOW__CELERY_BROKER_TRANSPORT_OPTIONS__VISIBILITY_TIMEOUT value: "64800" - name: AIRFLOW__CELERY__WORKER_CONCURRENCY value: "8" - name: AIRFLOW__API__AUTH_BACKENDS value: "airflow.api.auth.backend.basic_auth" - name: AIRFLOW__SCHEDULER__TASK_QUEUED_TIMEOUT value: "1200.0" # Enables selected built-in secrets that are set via environment variables by default. # Those secrets are provided by the Helm Chart secrets by default but in some cases you # might want to provide some of those variables with _CMD or _SECRET variable, and you should # in this case disable setting of those variables by setting the relevant configuration to false. enableBuiltInSecretEnvVars: AIRFLOW__CORE__FERNET_KEY: true # For Airflow <2.3, backward compatibility; moved to [database] in 2.3 AIRFLOW__CORE__SQL_ALCHEMY_CONN: true AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: true AIRFLOW_CONN_AIRFLOW_DB: true AIRFLOW__WEBSERVER__SECRET_KEY: true AIRFLOW__CELERY__CELERY_RESULT_BACKEND: true AIRFLOW__CELERY__RESULT_BACKEND: true AIRFLOW__CELERY__BROKER_URL: true AIRFLOW__ELASTICSEARCH__HOST: true AIRFLOW__ELASTICSEARCH__ELASTICSEARCH_HOST: true # # Airflow database & redis config data: metadataSecretName: airflow-metadata-db-connection brokerUrlSecretName: airflow-celery-redis fernetKey: my-fernet-key webserverSecretKey: my-webserver-key # Airflow Worker Config workers: # Number of airflow celery workers in StatefulSet replicas: 6 persistence: # Enable persistent volumes enabled: false # Volume size for worker StatefulSet size: 50Gi # If using a custom storageClass, pass name ref to all statefulSets here storageClassName: nfs resources: limits: memory: 3000Mi requests: cpu: "500m" memory: 1800Mi extraVolumeMounts: *gitsync_volume_mounts extraVolumes: *gitsync_volumes logGroomerSidecar: # Whether to deploy the Airflow worker log groomer sidecar. enabled: false env: [] # Airflow scheduler settings scheduler: replicas: 1 podDisruptionBudget: enabled: true resources: limits: memory: "3Gi" requests: cpu: 500m memory: "1200Mi" extraVolumes: [] extraVolumeMounts: [] logGroomerSidecar: enabled: true retentionDays: 90 resources: limits: cpu: 100m memory: 128Mi requests: cpu: 100m memory: 128Mi # Airflow webserver settings webserver: # Number of webservers replicas: 2 podDisruptionBudget: enabled: true # config: # minAvailable: 1 networkPolicy: ingress: # Peers for webserver NetworkPolicy ingress from: [] # Ports for webserver NetworkPolicy ingress (if `from` is set) ports: - port: "{{ .Values.ports.airflowUI }}" resources: requests: cpu: "1" memory: "1500Mi" limits: memory: "2200Mi" # Create initial user. defaultUser: enabled: false # Launch additional containers into webserver. extraContainers: [] # Add additional init containers into webserver. extraInitContainers: [] # Airflow Triggerer Config triggerer: enabled: true # Number of airflow triggerers in the deployment replicas: 1 persistence: enabled: false extraVolumeMounts: *gitsync_volume_mounts extraVolumes: *gitsync_volumes resources: limits: memory: 2000Mi requests: cpu: 250m memory: 1200Mi logGroomerSidecar: enabled: false # Airflow Dag Processor Config dagProcessor: enabled: true replicas: 1 resources: limits: memory: 1800Mi requests: cpu: 1 memory: 1500Mi # Mount additional volumes into dag processor. extraVolumeMounts: *gitsync_volume_mounts extraVolumes: *gitsync_volumes # StatsD settings statsd: enabled: false # Configuration for the redis provisioned by the chart redis: enabled: false registry: secretName: image-pull-creds # Define any ResourceQuotas for namespace quotas: {} # Define default/max/min values for pods and containers in namespace limits: [] # This runs as a CronJob to cleanup old pods. cleanup: enabled: false # Run every 15 minutes schedule: "*/15 * * * *" # Command to use when running the cleanup cronjob (templated). command: ~ # Args to use when running the cleanup cronjob (templated). args: [ "bash", "-c", "exec airflow kubernetes cleanup-pods --namespace={{ .Release.Namespace }}", ] # jobAnnotations are annotations on the cleanup CronJob jobAnnotations: {} # Select certain nodes for airflow cleanup pods. nodeSelector: {} affinity: {} tolerations: [] topologySpreadConstraints: [] podAnnotations: {} # Labels specific to cleanup objects and pods labels: {} resources: {} # limits: # cpu: 100m # memory: 128Mi # requests: # cpu: 100m # memory: 128Mi # Create ServiceAccount serviceAccount: # Specifies whether a ServiceAccount should be created create: true # The name of the ServiceAccount to use. # If not set and create is true, a name is generated using the release name name: ~ # Annotations to add to cleanup cronjob kubernetes service account. annotations: {} # When not set, the values defined in the global securityContext will be used securityContext: {} # runAsUser: 50000 # runAsGroup: 0 env: [] # Specify history limit # When set, overwrite the default k8s number of successful and failed CronJob executions that are saved. failedJobsHistoryLimit: ~ successfulJobsHistoryLimit: ~ # Configuration for postgresql subchart # Not recommended for production postgresql: enabled: false # Config settings to go into the mounted airflow.cfg # # Please note that these values are passed through the `tpl` function, so are # all subject to being rendered as go templates. If you need to include a # literal `{{` in a value, it must be expressed like this: # # a: '{{ "{{ not a template }}" }}' # # Do not set config containing secrets via plain text values, use Env Var or k8s secret object # yamllint disable rule:line-length config: core: dags_folder: '{{ include "airflow_dags" . }}' # This is ignored when used with the official Docker image load_examples: "False" executor: "{{ .Values.executor }}" # For Airflow 1.10, backward compatibility; moved to [logging] in 2.0 colored_console_log: "False" remote_logging: '{{- ternary "True" "False" .Values.elasticsearch.enabled }}' logging: remote_logging: '{{- ternary "True" "False" .Values.elasticsearch.enabled }}' colored_console_log: "False" webserver: enable_proxy_fix: "True" warn_deployment_exposure: "False" # For Airflow 1.10 rbac: "True" celery: flower_url_prefix: "{{ .Values.ingress.flower.path }}" worker_concurrency: 8 scheduler: standalone_dag_processor: '{{ ternary "True" "False" .Values.dagProcessor.enabled }}' # statsd params included for Airflow 1.10 backward compatibility; moved to [metrics] in 2.0 statsd_on: '{{ ternary "True" "False" .Values.statsd.enabled }}' statsd_port: 9125 statsd_prefix: airflow statsd_host: '{{ printf "%s-statsd" .Release.Name }}' # `run_duration` included for Airflow 1.10 backward compatibility; removed in 2.0. run_duration: 41460 celery_kubernetes_executor: kubernetes_queue: "kubernetes" # The `kubernetes` section is deprecated in Airflow >= 2.5.0 due to an airflow.cfg schema change. # The `kubernetes` section can be removed once the helm chart no longer supports Airflow < 2.5.0. kubernetes: namespace: "{{ .Release.Namespace }}" # The following `airflow_` entries are for Airflow 1, and can be removed when it is no longer supported. airflow_configmap: '{{ include "airflow_config" . }}' airflow_local_settings_configmap: '{{ include "airflow_config" . }}' pod_template_file: '{{ include "airflow_pod_template_file" . }}/pod_template_file.yaml' worker_container_repository: "{{ .Values.images.airflow.repository | default .Values.defaultAirflowRepository }}" worker_container_tag: "{{ .Values.images.airflow.tag | default .Values.defaultAirflowTag }}" multi_namespace_mode: '{{ ternary "True" "False" .Values.multiNamespaceMode }}' # The `kubernetes_executor` section duplicates the `kubernetes` section in Airflow >= 2.5.0 due to an airflow.cfg schema change. kubernetes_executor: namespace: "{{ .Release.Namespace }}" pod_template_file: '{{ include "airflow_pod_template_file" . }}/pod_template_file.yaml' worker_container_repository: "{{ .Values.images.airflow.repository | default .Values.defaultAirflowRepository }}" worker_container_tag: "{{ .Values.images.airflow.tag | default .Values.defaultAirflowTag }}" multi_namespace_mode: '{{ ternary "True" "False" .Values.multiNamespaceMode }}' # yamllint enable rule:line-length # Whether Airflow can launch workers and/or pods in multiple namespaces # If true, it creates ClusterRole/ClusterRolebinding (with access to entire cluster) multiNamespaceMode: false # `podTemplate` is a templated string containing the contents of `pod_template_file.yaml` used for # KubernetesExecutor workers. The default `podTemplate` will use normal `workers` configuration parameters # (e.g. `workers.resources`). As such, you normally won't need to override this directly, however, # you can still provide a completely custom `pod_template_file.yaml` if desired. # If not set, a default one is created using `files/pod-template-file.kubernetes-helm-yaml`. podTemplate: ~ # The following example is NOT functional, but meant to be illustrative of how you can provide a custom # `pod_template_file`. You're better off starting with the default in # `files/pod-template-file.kubernetes-helm-yaml` and modifying from there. # We will set `priorityClassName` in this example: # podTemplate: | # apiVersion: v1 # kind: Pod # metadata: # name: placeholder-name # labels: # tier: airflow # component: worker # release: {{ .Release.Name }} # spec: # priorityClassName: high-priority # containers: # - name: base # ... # Git sync dags: persistence: # Annotations for dags PVC annotations: {} # Enable persistent volume for storing dags enabled: false # Volume size for dags size: 1Gi # If using a custom storageClass, pass name here storageClassName: nfs # access mode of the persistent volume accessMode: ReadWriteMany ## the name of an existing PVC to use existingClaim: ## optional subpath for dag volume mount subPath: ~ gitSync: enabled: true # git repo clone url # ssh example: g...@github.com:apache/airflow.git # https example: https://github.com/apache/airflow.git repo: g...@github.com:bruin-data/dags.git branch: main rev: HEAD depth: 1 # the number of consecutive failures allowed before aborting maxFailures: 1 # subpath within the repo where dags are located # should be "" if dags are at repo root subPath: "" sshKeySecret: airflow-gitsync-dags-clone knownHosts: |- github.com ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCj7ndNxQowgcQnjshcLrqPEiiphnt+VTTvDP6mHBL9j1aNUkY4Ue1gvwnGLVlOhGeYrnZaMgRK6+PKCUXaDbC7qtbW8gIkhL7aGCsOr/C56SJMy/BCZfxd1nWzAOxSDPgVsmerOBYfNqltV9/hWCqBywINIR+5dIg6JTJ72pcEpEjcYgXkE2YEFXV1JHnsKgbLWNlhScqb2UmyRkQyytRLtL+38TGxkxCflmO+5Z8CSSNY7GidjMIZ7Q4zMjA2n1nGrlTDkzwDCsw+wqFPGQA179cnfGWOWRVruj16z6XyvxvjJwbz0wQZ75XK5tKSb7FNyeIEs4TT4jk+S4dhPeAUC5y+bDYirYgM4GC7uEnztnZyaVWQ7B381AK4Qdrwt51ZqExKbQpTUNn+EjqoTwvqNj4kqx5QUCI0ThS/YkOxJCXmPUWZbhjpCg56i+2aB6CmK2JGhn57K5mj0MNdBXA4/WnwH6XoPWJzK5Nyu2zB3nAZp+S5hpQs+p1vN1/wsjk= gitlab.com ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCsj2bNKTBSpIYDEGk9KxsGh3mySTRgMtXL583qmBpzeQ+jqCMRgBqB98u3z++J1sKlXHWfM9dyhSevkMwSbhoR8XIq/U0tCNyokEi/ueaBMCvbcTHhO7FcwzY92WK4Yt0aGROY5qX2UKSeOvuP4D6TPqKF1onrSzH9bx9XUf2lEdWT/ia1NEKjunUqu1xOB/StKDHMoX4/OKyIzuS0q/T1zOATthvasJFoPrAjkohTyaDUz2LN5JoH839hViyEG82yB+MjcFV5MU3N1l1QL3cVUCh93xSaua1N85qivl+siMkPGbO5xR/En4iEY6K2XPASUEMaieWVNTRCtJ4S8H+9 bitbucket.org ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAubiN81eDcafrgMeLzaFPsw2kNvEcqTKl/VqLat/MaB33pZy0y3rJZtnqwR2qOOvbwKZYKiEO1O6VqNEBxKvJJelCq0dTXWT5pbO2gDXC6h6QDXCaHo6pOHGPUy+YBaGQRGuSusMEASYiWunYN0vCAI8QaXnWMXNMdFP3jHAJH0eDsoiGnLPBlBp4TNm6rYI74nMzgz3B9IikW4WVK+dc8KZJZWYjAuORU3jc1c/NPskD2ASinf8v3xnfXeukU0sJ5N6m5E8VLjObPEO+mN2t/FZTMZLiFqPWc/ALSqnMnnhwrNi2rbfg/rd/IpL8Le3pSBne8+seeFVBoGqzHM9yXw== # interval between git sync attempts in seconds # high values are more likely to cause DAGs to become out of sync between different components # low values cause more traffic to the remote git repository wait: 5 containerName: git-sync uid: 65533 extraVolumeMounts: [] env: - name: GIT_SYNC_SUBMODULES value: "off" resources: limits: memory: 180Mi requests: cpu: 100m memory: 128Mi logs: persistence: enabled: true size: 50Gi storageClassName: nfs ``` ### Operating System Debian GNU/Linux 11 (bullseye) ### Versions of Apache Airflow Providers apache-airflow-providers-amazon==8.1.0 apache-airflow-providers-celery==3.2.1 apache-airflow-providers-cncf-kubernetes==7.4.0 apache-airflow-providers-common-sql==1.6.1 apache-airflow-providers-discord==3.2.0 apache-airflow-providers-docker==3.7.1 apache-airflow-providers-elasticsearch==4.5.1 apache-airflow-providers-ftp==3.4.2 apache-airflow-providers-google==10.5.0 apache-airflow-providers-grpc==3.2.1 apache-airflow-providers-hashicorp==3.4.1 apache-airflow-providers-http==4.4.2 apache-airflow-providers-imap==3.2.2 apache-airflow-providers-microsoft-azure==6.1.2 apache-airflow-providers-mysql==5.1.1 apache-airflow-providers-odbc==4.0.0 apache-airflow-providers-postgres==5.5.1 apache-airflow-providers-redis==3.2.1 apache-airflow-providers-sendgrid==3.2.1 apache-airflow-providers-sftp==4.3.1 apache-airflow-providers-slack==7.3.2 apache-airflow-providers-snowflake==4.4.0 apache-airflow-providers-sqlite==3.4.2 apache-airflow-providers-ssh==3.7.1 apache-airflow-providers-tableau==4.2.0 ### Deployment Official Apache Airflow Helm Chart ### Deployment details Kubernetes v1.27.2 ### Anything else LIterally every 10 minutes: <img width="1070" alt="image" src="https://github.com/apache/airflow/assets/16530606/709101ca-cda4-4803-9872-67fb7ec66b32"> ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org