[GitHub] [airflow] vandonr-amz opened a new pull request, #30134: add a describe step on failure in eks system tests

via GitHub Wed, 15 Mar 2023 12:29:39 -0700


vandonr-amz opened a new pull request, #30134:
URL: https://github.com/apache/airflow/pull/30134


   we've seen that every once in a while, an eks system test will fail with a 
timeout on the pod starting. 
   Since we delete all resources in the teardown, we're pretty clueless on 
what's happening, and the failures appear to be random so we cannot reproduce 
it in a controlled environment.
   
   With this change, we'll get at least a pod description if there is a 
failure, which should help us understand the root cause.
   
   This step is run only on failure (`trigger_rule=TriggerRule.ONE_FAILED`), 
expecting to run after the start_pod operation failed. It is possible that an 
earlier step would fail and trigger this, but that's how life is. The main 
point is that we don't want to run it every single time because it makes the 
test ~1 minute longer (installing awscli, necessary to get the kube config, is 
quite long).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] vandonr-amz opened a new pull request, #30134: add a describe step on failure in eks system tests

Reply via email to