potiuk commented on issue #44984: URL: https://github.com/apache/airflow/issues/44984#issuecomment-2556786442
> What do you think of switching to a ThreadPoolExecutor instead of a [ProcessPoolExecutor](https://github.com/apache/airflow/blob/providers-openlineage/1.14.0/providers/src/airflow/providers/openlineage/plugins/listener.py#L398) ? This would eliminate the need for serialization while still allowing asynchronous execution. From what I understand, there were MANY problems with previous implementation using ThreadPoolExecutor. The problem is that `Threads` are very flawed concept in Python due to GIL - and spawning new threads without full control over running any other threads and what they do (especially when you involve low-level C-code implemented in some libraries called from Python code) introduces a lot of contention, deadlock possibilities, various kinds of errors, especially if those libraries are not written in fully "thread-safe" way. I think @kacpermuda and @mobuchowski had a LOT of problems with Snowlake integration caused by this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
