mbutrovich opened a new issue, #21662:
URL: https://github.com/apache/datafusion/issues/21662

   ### Describe the bug
   
   Miri detects a Stacked Borrows violation in 
`datafusion-datasource-parquet/src/opener.rs` at line 1258:
   
   ```
   self.decoder.push_ranges(ranges, data)?;
   ```
   
   A `Unique` retag (mutable borrow) of the decoder is created, then 
invalidated by a `SharedReadOnly` retag at an `.await` point in the 
`unfold`-based stream. When the future resumes, the original `Unique` tag no 
longer exists in the borrow stack.
   
   This was found while testing DataFusion 54.0 with [Apache DataFusion 
Comet](https://github.com/apache/datafusion-comet). Comet runs Miri in CI and 
caught this: [CI 
run](https://github.com/apache/datafusion-comet/actions/runs/24487781385/job/71566377807?pr=3916).
 DataFusion does not currently run Miri in its own CI.
   
   ### To Reproduce
   
   Run Miri against any test that exercises the `PushDecoderStreamState` 
parquet stream path. In our case it was triggered by a test that calls 
`scan.execute(...).collect().await` on a parquet scan.
   
   ### Expected behavior
   
   No undefined behavior under Miri's Stacked Borrows model.
   
   ### Miri output
   
   ```
   error: Undefined Behavior: trying to retag from <28128923> for 
SharedReadWrite permission at alloc8513101[0x8], but that tag does not exist in 
the borrow stack for this location
       --> datafusion/datasource-parquet/src/opener.rs:1258:25
        |
   1258 |                         self.decoder.push_ranges(ranges, data)?;
        |                         ^^^^^^^^^^^^ this error occurs as part of 
two-phase retag at alloc8513101[0x8..0x20]
        |
        = help: this indicates a potential bug in the program: it performed an 
invalid operation, but the Stacked Borrows rules it violated are still 
experimental
        = help: see 
https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md
 for further information
   help: <28128923> was created by a Unique retag at offsets [0x8..0x20]
   help: <28128923> was later invalidated at offsets [0x0..0x138] by a 
SharedReadOnly retag
   ```
   
   ### Additional context
   
   The full backtrace points to `PushDecoderStreamState::transition` -> 
`RowGroupsPrunedParquetOpen::build_stream` closure -> `futures::stream::Unfold` 
poll. The aliasing violation occurs because the decoder is mutably re-borrowed 
across a yield point in the `unfold` stream.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to