janniklinde opened a new pull request, #2346: URL: https://github.com/apache/systemds/pull/2346
This patch introduces a new failure-propagation mechanism for out-of-core (OOC) tasks via the `LocalTaskQueue`. Previously, unexpected exceptions in OOC tasks could silently fail, leaving upstream tasks waiting indefinitely because their output streams were never closed. To address this, we now propagate exceptions through the queue hierarchy, ensuring upstream and downstream threads are properly interrupted. `LocalTaskQueue` maintains an exception state that allows both enqueue and dequeue operations to rethrow the stored exception, propagating errors across dependent queues. When a failure occurs, all related queues are notified, cascading the exception until it reaches the main thread and any other affected tasks. Additionally, a common OOC task submission method was added to `OOCInstruction` to replace manual submission via `CommonThreadPool`. This ensures consistent exception propagation and simplifies OOC task management. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
