[I] Keda does not work properly whwn using gitSync [airflow]

via GitHub Mon, 09 Dec 2024 11:47:09 -0800


yovio-rca opened a new issue, #44798:
URL: https://github.com/apache/airflow/issues/44798


   ### Official Helm Chart version
   
   1.15.0 (latest released)
   
   ### Apache Airflow version
   
   2.10.3
   
   ### Kubernetes Version
   
   1.31.2
   
   ### Helm Chart configuration
   
   I have the below section of workers form my values.yaml:
   
   workers:  
     replicas: 1
     keda:
       enabled: true
       minReplicaCount: 1   # each worker can has 16 celery concurencies
       maxReplicaCount: 20
     resources:
       requests:
         cpu: 100m
         memory: 1750Mi
   
   ### Docker Image customizations
   
   _No response_
   
   ### What happened
   
   I try to use keda to autoscale my airflow workers which use 
CeleryKubernetesExecutor.
   
   I followed instruction in 
https://airflow.apache.org/docs/helm-chart/stable/keda.html to install Keda and 
change my airflow values.yaml accordingly.
   
   After applying the changes, I observed lots of warning from keda-operator 
with error message:
   "error parsing postgreSQL metadata: error parsing postgresql metadata: no 
host given"
   
   It seems to me that Keda ScaledObject doesnt work properly.
   
   I checked that:
   1. ScaledObject created on correct namespace, same as my airflow namespace
   2. It has correct trigger type: postgresql  (I use postgres for airflow 
metadata)
   3. it has connectionFromEnv: AIRFLOW_CONN_AIRFLOW_DB
   
   After further investigation, I found that ScaledObject scaleTargetRef has 
other property called "envSourceContainerName"
   as explain in 
https://keda.sh/docs/2.16/reference/scaledobject-spec/#scaletargetref
   
   The purpose of that property is to tell Keda name of container on which 
AIRFLOW_CONN_AIRFLOW_DB defined.
   
   If I modify airflow-worker ScaledObject by adding envSourceContainerName: 
worker into scaleTargetRef, it start working properly.
   
   When I check my airflow-worker StatefulSet, I can see that it has 3 
containers: git-sync, worker-log-groomer, and worker.
   
   I'm aware that on the chart template workers/worker-deployment.yaml, worker 
container is defined as 1st container, but for some reason when I apply the 
template, worker container is 3rd container in the StatefulSet. I tried 
deleting airflow-worker StatefulSet and do another helm install with same 
result.
   
   In my opinion, we cant rely on the position of worker container within the 
StatefulSet nor Deployment, but we should specify worker container name in 
ScaledObject scaleTargetRef
   
   
   ### What you think should happen instead
   
   Keda autoscaller should work regardless of position of worker container in 
StatefulSet or Deployment.
   
   ### How to reproduce
   
   1. Deploy aiflow with git-sync enable
   2. Deploy Keda into Kubernetes cluster
   3. Configure worker to use keda
   4. Observe that ScaledObj has no warning
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Keda does not work properly whwn using gitSync [airflow]

Reply via email to