ron-gaist opened a new issue, #60261:
URL: https://github.com/apache/airflow/issues/60261

   ### Apache Airflow Provider(s)
   
   edge3
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-celery==3.13.0
   apache-airflow-providers-common-compat==1.8.3
   apache-airflow-providers-common-io==1.6.4
   apache-airflow-providers-common-sql=1.28.2
   apache-airflow-providers-fab==3.0.1
   apache-airflow-providers-standard==1.9.1
   apache-airflow-providers-postgres==6.4.0
   apache-airflow-providers-edge3==1.4.0
   
   ### Apache Airflow version
   
   3.1.2
   
   ### Operating System
   
   Debian GNU/Linux 12 (bookworm)
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   Setup scale: (yes we know this is very large scale)
   8 api servers x 64 worker processes each
   9 Edge workers attempting to connect continuously (and continuously failing)
   etc... (rest are not relevant)
   
   ### What happened
   
   Upon edge worker startup we get the following error:
   > 405 Client Error: Method not allowed for url: 
http://<AIRFLOW-API-SERVER-URL>/edge_worker/v1/worker/<WORKER-NAME>
   
   Possible relevant values in the helm chart values.yaml:
   ```
   apiSecretKey: <VALUE>
   jwtSecret: <VALUE>
   config:
       edge:
            api_enabled: 'True'
            api_url: ''http://<AIRFLOW-API-SERVER-URL>/edge_worker/v1/rcpapi'
   ```
   
   This behavior seems to come and go - workers can fail on startup and after 
some time start up normally.
   
   ### What you think should happen instead
   
   _No response_
   
   ### How to reproduce
   
   Reproducing this seems to be tricky. Even for us this behavior sometimes 
exists and other times not. The best information we can give for reproducing 
this seems to be the airflow config.
   
   ### Anything else
   
   `405 Method not allowed` seems to suggest that the API servers did not start 
with the necessary portion of the edge plugin.
   This is very odd, and is the kind of behavior we'd expect if the config 
lacked `edge: api_enabled: 'True', api_url: '...'`
   However we double and triple checked that these configurations are set in 
the api servers (and on the edge worker too, for what it's worth)
   
   
   **Additional note (could be related):**
   Upon startup of each api-server we get an error related to edge plugin 
import:
   ```
   [error] Failed to import plugin edge_executor [airflow.plugins_manager] 
loc=plugins_manager.py:268
   ....
   psycop2.errors.LockNotAvailable: canceling statement due to lock timeout
    ```
   We checked the lock timeout on our postgres db and it is set to 0 (unlimited)
   
   We aren't pointing to this as the *root cause* because previously edge has 
worked for us while this plugin has been similarly not imported. But this might 
point in the right direction
   We know this isn't related to the *amount* of api_servers (we tried with 1) 
but it might be related to the amount of worker processes - we don't know for 
sure.
   
   Thank you for the time considering our issue!!
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to