benibus opened a new pull request, #35047:
URL: https://github.com/apache/arrow/pull/35047

   ### What changes are included in this PR?
   
   Increases the block size used in the `ScanWithParallelDecoding` test to 
reduce the number of (potentially parallel) parsing/decoding jobs from 1000+ to 
roughly 60 while increasing the runtime of each job. This should still satisfy 
the purpose of test without going completely over the top.
   
   ### Are these changes tested?
   
   Yes, tested locally on the alpine docker image many times after successfully 
reproducing the original issue.
   
   ### Are there any user-facing changes?
   
   No
   
   ### Notes
   
   This doesn't solve the underlying cause (although the testing parameters 
were arguably far too unusual in the first place), however I do believe that 
I've identified the issue via a core dump.
   
   The problem starts 
[here](https://github.com/apache/arrow/blob/47a602dbd9b7b7f7720a5e62467e3e6c61712cf3/cpp/src/arrow/json/reader.cc#L362-L369),
 where a `MappingGenerator` gets stacked on top of a generator that applies 
readahead. It seems that the underlying futures were completing very quickly, 
resulting in `AddCallback` being called recursively many, many times - starting 
[here](https://github.com/apache/arrow/blob/47a602dbd9b7b7f7720a5e62467e3e6c61712cf3/cpp/src/arrow/util/async_generator.h#L240).
 This leads to a stack overflow under specific circumstances.
   
   So, to fully guard against the problem, you'd probably want to change the 
logic of `MappingGenerator` to use `TryAddCallback` + an inner loop to avoid 
overflowing the stack. Not entirely sure if doing this would be worthwhile 
though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to