mbutrovich opened a new issue, #21662: URL: https://github.com/apache/datafusion/issues/21662
### Describe the bug Miri detects a Stacked Borrows violation in `datafusion-datasource-parquet/src/opener.rs` at line 1258: ``` self.decoder.push_ranges(ranges, data)?; ``` A `Unique` retag (mutable borrow) of the decoder is created, then invalidated by a `SharedReadOnly` retag at an `.await` point in the `unfold`-based stream. When the future resumes, the original `Unique` tag no longer exists in the borrow stack. This was found while testing DataFusion 54.0 with [Apache DataFusion Comet](https://github.com/apache/datafusion-comet). Comet runs Miri in CI and caught this: [CI run](https://github.com/apache/datafusion-comet/actions/runs/24487781385/job/71566377807?pr=3916). DataFusion does not currently run Miri in its own CI. ### To Reproduce Run Miri against any test that exercises the `PushDecoderStreamState` parquet stream path. In our case it was triggered by a test that calls `scan.execute(...).collect().await` on a parquet scan. ### Expected behavior No undefined behavior under Miri's Stacked Borrows model. ### Miri output ``` error: Undefined Behavior: trying to retag from <28128923> for SharedReadWrite permission at alloc8513101[0x8], but that tag does not exist in the borrow stack for this location --> datafusion/datasource-parquet/src/opener.rs:1258:25 | 1258 | self.decoder.push_ranges(ranges, data)?; | ^^^^^^^^^^^^ this error occurs as part of two-phase retag at alloc8513101[0x8..0x20] | = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information help: <28128923> was created by a Unique retag at offsets [0x8..0x20] help: <28128923> was later invalidated at offsets [0x0..0x138] by a SharedReadOnly retag ``` ### Additional context The full backtrace points to `PushDecoderStreamState::transition` -> `RowGroupsPrunedParquetOpen::build_stream` closure -> `futures::stream::Unfold` poll. The aliasing violation occurs because the decoder is mutably re-borrowed across a yield point in the `unfold` stream. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
