ron-gaist opened a new issue, #60261:
URL: https://github.com/apache/airflow/issues/60261
### Apache Airflow Provider(s)
edge3
### Versions of Apache Airflow Providers
apache-airflow-providers-celery==3.13.0
apache-airflow-providers-common-compat==1.8.3
apache-airflow-providers-common-io==1.6.4
apache-airflow-providers-common-sql=1.28.2
apache-airflow-providers-fab==3.0.1
apache-airflow-providers-standard==1.9.1
apache-airflow-providers-postgres==6.4.0
apache-airflow-providers-edge3==1.4.0
### Apache Airflow version
3.1.2
### Operating System
Debian GNU/Linux 12 (bookworm)
### Deployment
Official Apache Airflow Helm Chart
### Deployment details
Setup scale: (yes we know this is very large scale)
8 api servers x 64 worker processes each
9 Edge workers attempting to connect continuously (and continuously failing)
etc... (rest are not relevant)
### What happened
Upon edge worker startup we get the following error:
> 405 Client Error: Method not allowed for url:
http://<AIRFLOW-API-SERVER-URL>/edge_worker/v1/worker/<WORKER-NAME>
Possible relevant values in the helm chart values.yaml:
```
apiSecretKey: <VALUE>
jwtSecret: <VALUE>
config:
edge:
api_enabled: 'True'
api_url: ''http://<AIRFLOW-API-SERVER-URL>/edge_worker/v1/rcpapi'
```
This behavior seems to come and go - workers can fail on startup and after
some time start up normally.
### What you think should happen instead
_No response_
### How to reproduce
Reproducing this seems to be tricky. Even for us this behavior sometimes
exists and other times not. The best information we can give for reproducing
this seems to be the airflow config.
### Anything else
`405 Method not allowed` seems to suggest that the API servers did not start
with the necessary portion of the edge plugin.
This is very odd, and is the kind of behavior we'd expect if the config
lacked `edge: api_enabled: 'True', api_url: '...'`
However we double and triple checked that these configurations are set in
the api servers (and on the edge worker too, for what it's worth)
**Additional note (could be related):**
Upon startup of each api-server we get an error related to edge plugin
import:
```
[error] Failed to import plugin edge_executor [airflow.plugins_manager]
loc=plugins_manager.py:268
....
psycop2.errors.LockNotAvailable: canceling statement due to lock timeout
```
We checked the lock timeout on our postgres db and it is set to 0 (unlimited)
We aren't pointing to this as the *root cause* because previously edge has
worked for us while this plugin has been similarly not imported. But this might
point in the right direction
We know this isn't related to the *amount* of api_servers (we tried with 1)
but it might be related to the amount of worker processes - we don't know for
sure.
Thank you for the time considering our issue!!
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]