dabla commented on issue #45930: URL: https://github.com/apache/airflow/issues/45930#issuecomment-2685159279
> This is a question of signal handling most likely. Airlfow's timeout is done using SIGALRM. If the driver runs a low-level c code that does not handle signals properly this behaviour might happen. Proper implementation of a c-library code is to periodically check in a loop if signal has been received and pass control back to python if it happened - but some c-level code (especially database drivers) might not handle it properly. > > The solution - discussed few times is to make another fork of the process and handle timeout in the separate fork, and we decided to defer it to after new Task SDK is completed - and maybe even in new Task SDK it's already handled ? [@ashb](https://github.com/ashb) [@amoghrajesh](https://github.com/amoghrajesh) ? > > Generally we have already two processes (parent and fork): > > a) supervisor that performs heartbeat b) task that executes the job there was also c) openlineage handling > > In Airlfow 2 timeout was implemented as siglarm handling inside b) which means that if b) executed bad-behaving C-code timeout was not happening. In Airflow 3 we have a bit different way of heartbeating and task execution and it's being still iterated on - and the options we could implement is to move the timout handling to a) or introduce another fork with timeout between a) and b). The former likely better from resource usage point of view. Thank you Jarek, this seems indeed to be a probable explanation. This proves that I have still a lot to learn from Airflow inner workings :( -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
