dabla commented on issue #45930:
URL: https://github.com/apache/airflow/issues/45930#issuecomment-2685159279

   > This is a question of signal handling most likely. Airlfow's timeout is 
done using SIGALRM. If the driver runs a low-level c code that does not handle 
signals properly this behaviour might happen. Proper implementation of a 
c-library code is to periodically check in a loop if signal has been received 
and pass control back to python if it happened - but some c-level code 
(especially database drivers) might not handle it properly.
   > 
   > The solution - discussed few times is to make another fork of the process 
and handle timeout in the separate fork, and we decided to defer it to after 
new Task SDK is completed - and maybe even in new Task SDK it's already handled 
? [@ashb](https://github.com/ashb) 
[@amoghrajesh](https://github.com/amoghrajesh) ?
   > 
   > Generally we have already two processes (parent and fork):
   > 
   > a) supervisor that performs heartbeat b) task that executes the job there 
was also c) openlineage handling
   > 
   > In Airlfow 2 timeout was implemented as siglarm handling inside b) which 
means that if b) executed bad-behaving C-code timeout was not happening. In 
Airflow 3 we have a bit different way of heartbeating and task execution and 
it's being still iterated on - and the options we could implement is to move 
the timout handling to a) or introduce another fork with timeout between a) and 
b). The former likely better from resource usage point of view.
   
   Thank you Jarek, this seems indeed to be a probable explanation.  This 
proves that I have still a lot to learn from Airflow inner workings :(


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to