jscheffl commented on PR #46406: URL: https://github.com/apache/airflow/pull/46406#issuecomment-2634891945
I think it is not good to reduce the grace time to fail in case of HTTP 500. Most often K8s infra recovery or a DB restart takes longer than 3 minutes. From the productive use of Edge Worker we can say that the most stable situation we have is by the settingss we made currently as default. I fear reducing this will lower the quality of service. I propose to handle functional problems in a different error code to skip retry. If you reduce it as here in the PR, please make it configurable at least such that we can increase it back to 10 attempts, else I fear it will lower the stability when moving to Task SDK. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
