AliRana30 opened a new issue, #49061: URL: https://github.com/apache/arrow/issues/49061
### Describe the enhancement requested ## Problem Arrow has 30+ documented deadlock risks across the codebase but no centralized detection or prevention framework. **Affected files:** - `cpp/src/arrow/util/thread_pool.h:271,292,329,443` - Nested parallelism causes deadlock - `cpp/src/arrow/filesystem/s3fs.cc:2098,2118` - Future callbacks deadlock (GH-41862) - `cpp/src/arrow/filesystem/filesystem.cc:645` - Blocking Close() causes deadlocks - `cpp/src/arrow/util/async_generator.h:1052,1746,1878` - Lock ordering issues - `cpp/src/arrow/flight/sql/odbc/odbc_impl/flight_sql_driver.cc:43` - **Abseil deadlock detection explicitly disabled** - `cpp/src/arrow/dataset/dataset_writer.cc:118` - Queue overflow causes deadlock - `cpp/src/arrow/acero/asof_join_node.cc:767,1417` - Pause/backpressure deadlock - `cpp/src/arrow/csv/reader_test.cc:130-132` - Destructor can deadlock on cleanup **Related:** #48714 ## Proposed Solution Implement comprehensive deadlock detection and prevention framework: 1. **Runtime Detection:** Build lock ordering validator tracking mutex acquisition across threads, create cycle detector for resource dependencies 2. **Enable Abseil Detection:** Re-enable `absl::SetMutexDeadlockDetectionMode` currently disabled in Flight SQL, fix underlying issues requiring the workaround 3. **Timeout Mechanisms:** Add configurable timeouts for all blocking operations, implement timeout-based detection for suspicious long waits 4. **Thread Tracking:** Add thread state tracking with lock acquisition logging in debug builds, integrate with ThreadPool/Executor infrastructure 5. **Static Analysis:** Implement compile-time detection of nested parallelism patterns and potential deadlock scenarios 6. **Prevention Policies:** Enforce lock ordering policies, add automatic deadlock avoidance in executor scheduling 7. **Testing Infrastructure:** Create automated stress tests exercising concurrent operations, add chaos engineering tests with random delays/contention 8. **CI Integration:** Run deadlock detection in nightly builds with TSAN, add performance regression tests for new deadlock risks 9. **Documentation:** Document safe parallelism patterns, create lock hierarchy guide, provide examples of anti-patterns to avoid **Benefits:** Prevent production deadlocks, enable faster debugging, support safe parallelism optimizations, improve enterprise reliability ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
