vanshaj2023 commented on issue #49272: URL: https://github.com/apache/arrow/issues/49272#issuecomment-3902522493
Hi @raulcd I'd like to work on this issue. After reviewing the repository, I found that the segfault in `ReaderTest.MultipleChunksParallel` is likely related to: **Implementation Approach:** 1. The parallel chunk processing in `cpp/src/arrow/json/reader.cc` might have race conditions when multiple threads access shared resources 2. Check memory management in `ThreadedTaskGroup` usage within the JSON reader - potential use-after-free or improper synchronization 3. MinGW-specific thread handling differences compared to MSVC - review thread pool initialization and cleanup 4. Add mutex guards around shared state modifications in parallel parsing code 5. Verify proper lifetime management of chunked JSON buffers during concurrent reads The intermittent nature and the fact it only fails on MinGW suggests platform-specific threading or memory alignment issues in the parallel reader implementation. Could you please assign this issue to me? Does this approach sound reasonable, or am I missing something? Any specific areas in the JSON reader code I should prioritize? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
