JerAguilon commented on issue #37796: URL: https://github.com/apache/arrow/issues/37796#issuecomment-1728071032
Made a repro. I patched https://github.com/apache/arrow/pull/34234 to be able to build a simple example in Python, but mechanically the bug exists in C++ too. https://gist.github.com/JerAguilon/5a6a80411fd53dad9d9d547003bec12e Here we do 10 concurrent simple asof joins (500 rows on the left hand side, 5k rows on the right hand side ) and `union` them. `500rows * 10 asofs=5000`, so we expect 5k rows out The right hand side is a parquet file of row groups of size 1. If I place `ARROW_LOG(ERROR) << "paused";` statements on `BackPressureController::Pause()` ([here](https://github.com/apache/arrow/blob/2455bc07e09cd5341d1fabdb293afbd07682f0b2/cpp/src/arrow/acero/asof_join_node.cc#L540C1-L540C1)) I get several "paused" logs after the 5000'th row is emitted from `unioned.to_batches` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
