ron-gaist commented on issue #60261: URL: https://github.com/apache/airflow/issues/60261#issuecomment-3743566962
After investigating further, I found in the traceback that the failure happens in the `_get_api_endpoint` function in the file `airflow.providers.edge3.plugins.edge_executor_plugin`. When I looked at that function, I saw it tries to acquire a lock and run migrations during each `api-server` process startup (we had 8 `api-server` replicas with 64 workers each). With multiple `api-server` processes starting at the same time, most of them time out while waiting on `pg_advisory_lock`. As a result, many `api-server` processes fail to load the Edge API endpoints, which is what causes the 405 error. Based on my current investigation, it seems the solution is to move the migrations to Alembic and run them only once via the `migrateDatabaseJob`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
