abhishekshenoy commented on pull request #19740: URL: https://github.com/apache/airflow/pull/19740#issuecomment-989716007
@lwyszomi @potiuk This will solve the server error received at the start of the job. We are experiencing issues wherein Google apis fail sometimes, on connecting with Google Support they mentioned `Users are expected to see this set of errors (503) every now and then. These could be due to expected issues like busy servers, network unavailability, etc` In the above scenarios as the response is not successfully retrieved , Job Sensor fails though the cluster is running. I have an approach which is similar to the one used here but by using something like a resettable counter for any 'API Reponse Error' which on passing a threshold , only then should fail the Job Sensor. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
