lidavidm commented on pull request #11207:
URL: https://github.com/apache/arrow/pull/11207#issuecomment-925065761


   Ah, most of the stack frames look like this, once you remove irrelevant ones:
   
   ```
   2021-09-22T14:19:28.6454400Z     frame #83: 0x0000000105789ea0 
libarrow_dataset.600.0.0.dylib`arrow::MergedGenerator<arrow::dataset::EnumeratedRecordBatch>::InnerCallback::operator(this=0x00007fc747637bc8,
 
maybe_next=0x00007fc74763cd20)(arrow::Result<arrow::dataset::EnumeratedRecordBatch>
 const&) at async_generator.h:1029:14
   2021-09-22T14:19:28.6479720Z     frame #91: 0x0000000105789f70 
libarrow_dataset.600.0.0.dylib`arrow::MergedGenerator<arrow::dataset::EnumeratedRecordBatch>::InnerCallback::operator(this=0x00007fc74763d1a8,
 
maybe_next=0x00007fc74763b8f0)(arrow::Result<arrow::dataset::EnumeratedRecordBatch>
 const&) at async_generator.h:1031:48
   2021-09-22T14:19:28.6503520Z     frame #99: 0x0000000105789f70 
libarrow_dataset.600.0.0.dylib`arrow::MergedGenerator<arrow::dataset::EnumeratedRecordBatch>::InnerCallback::operator(this=0x00007fc7476357f8,
 
maybe_next=0x00007fc74763cde0)(arrow::Result<arrow::dataset::EnumeratedRecordBatch>
 const&) at async_generator.h:1031:48
   2021-09-22T14:19:28.6524850Z     frame #107: 0x0000000105789f70 
libarrow_dataset.600.0.0.dylib`arrow::MergedGenerator<arrow::dataset::EnumeratedRecordBatch>::InnerCallback::operator(this=0x00007fc74763c008,
 
maybe_next=0x00007fc74763cc60)(arrow::Result<arrow::dataset::EnumeratedRecordBatch>
 const&) at async_generator.h:1031:48
   2021-09-22T14:19:28.6552850Z     frame #115: 0x0000000105789f70 
libarrow_dataset.600.0.0.dylib`arrow::MergedGenerator<arrow::dataset::EnumeratedRecordBatch>::InnerCallback::operator(this=0x00007fc74747fec8,
 
maybe_next=0x00007fc74747a480)(arrow::Result<arrow::dataset::EnumeratedRecordBatch>
 const&) at async_generator.h:1031:48
   2021-09-22T14:19:28.6580800Z     frame #123: 0x0000000105789f70 
libarrow_dataset.600.0.0.dylib`arrow::MergedGenerator<arrow::dataset::EnumeratedRecordBatch>::InnerCallback::operator(this=0x00007fc7475919c8,
 
maybe_next=0x00007fc74759b2a0)(arrow::Result<arrow::dataset::EnumeratedRecordBatch>
 const&) at async_generator.h:1031:48
   ```
   
   I suppose what's going on is that the merged generator's inner generator is 
returning a completed future, so callbacks run immediately, and in the process 
another pull is made to the merged generator; then the merged generator's 
callback is run, and hence we get stuck in recursion (limited only by the size 
of the underlying generator). And this only affects the new test, which slices 
batches up, leading to there being 1024 batches, multiplied by ~10 or so frames 
per batch - causing a stack overflow. A similar issue happens if we drop the 
stack size on Linux.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to