The GitHub Actions job "Tests" on airflow.git has succeeded.
Run started by GitHub user potiuk (triggered by potiuk).

Head commit for run:
efdfa10500ebd80b820b614511c93c4ad7f7a7f2 / Jarek Potiuk <[email protected]>
Better handle timeouts on test failures

We used to send SIGQUIT to running containers in tests if they
were running for too long - that was supposed to handled the case
when a test was hanging in the way that pytest timeout handling could
not break individual tests. That signal was set about 10 minutes before
the total job timeout to give the tests time to upload log files and
print diagnostics information before the runner was killed by Github
Actions mechanisms.

This however was not enough - we had recently a number of hanging
tests that were failing after 1h 58 minutes - which means that our
signal sending to docker containers were not effective - and our
SIGQUIT was not effective.

This PR makes the timeout code more resilient and possibly gives us
a chance to see what is going on:

1) we are printing more diagnostics information when we attempt to
   send the signals
2) we are sending SIGTERM instead of SIGQUIT as this seems to be
   more standard way of stopping containers (SIGQUIT was default
   STOPSIGNAL in the past but SIGTERM is now more commonly used)
3) we wait 10 seconds after sending the signal and if the containers
   are still running we send SIGKILL to the containers which - in
   theory kill the containers always, unconditionally - and give
   us a chance to print and upload diagnostics information.

Report URL: https://github.com/apache/airflow/actions/runs/13483104885

With regards,
GitHub Actions via GitBox


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to