potiuk commented on issue #28275:
URL: https://github.com/apache/airflow/issues/28275#issuecomment-1354013300

   Very useful to see those logs.
   
   I think you might be RIGHT - it might be the pytest.timeout that actually 
triggers the problem not the other way round. 
   
   I  am usually quite uneasy when I have no plausible hypothesis why things 
are happening.
   
   And in this case, I do not know the exact reason, but I think we can be much 
closer to solving the problem and possibly even eliminate simiilar problems in 
the future.
   
   I believe the problem might come from the fact that `pytest-timeout` uses 
SIGALARM. Or rather that we ALSO use use SIGALARM in Airflow:
   
   This is the timeout context manager we use in Airflow:
   
   ```
   class TimeoutPosix(_timeout, LoggingMixin):
       """POSIX Timeout version: To be used in a ``with`` block and timeout its 
content."""
   
       def __init__(self, seconds=1, error_message="Timeout"):
           super().__init__()
           self.seconds = seconds
           self.error_message = error_message + ", PID: " + str(os.getpid())
   
       def handle_timeout(self, signum, frame):
           """Logs information and raises AirflowTaskTimeout."""
           self.log.error("Process timed out, PID: %s", str(os.getpid()))
           raise AirflowTaskTimeout(self.error_message)
   
       def __enter__(self):
           try:
               signal.signal(signal.SIGALRM, self.handle_timeout)
               signal.setitimer(signal.ITIMER_REAL, self.seconds)
           except ValueError:
               self.log.warning("timeout can't be used in the current context", 
exc_info=True)
   
       def __exit__(self, type_, value, traceback):
           try:
               signal.setitimer(signal.ITIMER_REAL, 0)
           except ValueError:
               self.log.warning("timeout can't be used in the current context", 
exc_info=True)
   ```
   
   And while I am not sure how (and if?) `with timeout` is uesd in the process, 
this could explain it - with timeout is interrupted by the SIGALRM from pytest, 
AirflowTaskTimeout is thrown and somethign somewhere re-runs the task and adds 
it to the queue (but. I am not sure what that something could be - I could not 
trace retrying /re-adding the task to AirflowTaskTimeout). It might also be 
because of managing threads/forking by Pytest (signals are handled in main 
thread only  even if they are received in another thread) - 
https://docs.python.org/3.4/library/signal.html#execution-of-python-signal-handlers
   
   
   In this case I think it's plausible enought to not try to execute it fully, 
but just increase the timeout and see if it fixes the problem. It was appearing 
frequently enough to notice it quite easily in our CI, so if it gets fixed 
after increasing timeout, we will have a plausible theory.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to