benibus opened a new pull request, #35047: URL: https://github.com/apache/arrow/pull/35047
### What changes are included in this PR? Increases the block size used in the `ScanWithParallelDecoding` test to reduce the number of (potentially parallel) parsing/decoding jobs from 1000+ to roughly 60 while increasing the runtime of each job. This should still satisfy the purpose of test without going completely over the top. ### Are these changes tested? Yes, tested locally on the alpine docker image many times after successfully reproducing the original issue. ### Are there any user-facing changes? No ### Notes This doesn't solve the underlying cause (although the testing parameters were arguably far too unusual in the first place), however I do believe that I've identified the issue via a core dump. The problem starts [here](https://github.com/apache/arrow/blob/47a602dbd9b7b7f7720a5e62467e3e6c61712cf3/cpp/src/arrow/json/reader.cc#L362-L369), where a `MappingGenerator` gets stacked on top of a generator that applies readahead. It seems that the underlying futures were completing very quickly, resulting in `AddCallback` being called recursively many, many times - starting [here](https://github.com/apache/arrow/blob/47a602dbd9b7b7f7720a5e62467e3e6c61712cf3/cpp/src/arrow/util/async_generator.h#L240). This leads to a stack overflow under specific circumstances. So, to fully guard against the problem, you'd probably want to change the logic of `MappingGenerator` to use `TryAddCallback` + an inner loop to avoid overflowing the stack. Not entirely sure if doing this would be worthwhile though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
