lidavidm commented on pull request #11207: URL: https://github.com/apache/arrow/pull/11207#issuecomment-925065761
Ah, most of the stack frames look like this, once you remove irrelevant ones: ``` 2021-09-22T14:19:28.6454400Z frame #83: 0x0000000105789ea0 libarrow_dataset.600.0.0.dylib`arrow::MergedGenerator<arrow::dataset::EnumeratedRecordBatch>::InnerCallback::operator(this=0x00007fc747637bc8, maybe_next=0x00007fc74763cd20)(arrow::Result<arrow::dataset::EnumeratedRecordBatch> const&) at async_generator.h:1029:14 2021-09-22T14:19:28.6479720Z frame #91: 0x0000000105789f70 libarrow_dataset.600.0.0.dylib`arrow::MergedGenerator<arrow::dataset::EnumeratedRecordBatch>::InnerCallback::operator(this=0x00007fc74763d1a8, maybe_next=0x00007fc74763b8f0)(arrow::Result<arrow::dataset::EnumeratedRecordBatch> const&) at async_generator.h:1031:48 2021-09-22T14:19:28.6503520Z frame #99: 0x0000000105789f70 libarrow_dataset.600.0.0.dylib`arrow::MergedGenerator<arrow::dataset::EnumeratedRecordBatch>::InnerCallback::operator(this=0x00007fc7476357f8, maybe_next=0x00007fc74763cde0)(arrow::Result<arrow::dataset::EnumeratedRecordBatch> const&) at async_generator.h:1031:48 2021-09-22T14:19:28.6524850Z frame #107: 0x0000000105789f70 libarrow_dataset.600.0.0.dylib`arrow::MergedGenerator<arrow::dataset::EnumeratedRecordBatch>::InnerCallback::operator(this=0x00007fc74763c008, maybe_next=0x00007fc74763cc60)(arrow::Result<arrow::dataset::EnumeratedRecordBatch> const&) at async_generator.h:1031:48 2021-09-22T14:19:28.6552850Z frame #115: 0x0000000105789f70 libarrow_dataset.600.0.0.dylib`arrow::MergedGenerator<arrow::dataset::EnumeratedRecordBatch>::InnerCallback::operator(this=0x00007fc74747fec8, maybe_next=0x00007fc74747a480)(arrow::Result<arrow::dataset::EnumeratedRecordBatch> const&) at async_generator.h:1031:48 2021-09-22T14:19:28.6580800Z frame #123: 0x0000000105789f70 libarrow_dataset.600.0.0.dylib`arrow::MergedGenerator<arrow::dataset::EnumeratedRecordBatch>::InnerCallback::operator(this=0x00007fc7475919c8, maybe_next=0x00007fc74759b2a0)(arrow::Result<arrow::dataset::EnumeratedRecordBatch> const&) at async_generator.h:1031:48 ``` I suppose what's going on is that the merged generator's inner generator is returning a completed future, so callbacks run immediately, and in the process another pull is made to the merged generator; then the merged generator's callback is run, and hence we get stuck in recursion (limited only by the size of the underlying generator). And this only affects the new test, which slices batches up, leading to there being 1024 batches, multiplied by ~10 or so frames per batch - causing a stack overflow. A similar issue happens if we drop the stack size on Linux. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
