ametel01 opened a new pull request, #23172:
URL: https://github.com/apache/datafusion/pull/23172

   ## Which issue does this PR close?
   
   - Closes #23171.
   
   ## Rationale for this change
   
   Imported Substrait physical plans currently turn `ReadRel.LocalFiles` 
entries into a local filesystem-backed Parquet `DataSourceExec` without 
requiring the embedding host to approve the referenced local paths. In hosts 
that accept physical Substrait plans from lower-trust callers, that can let 
serialized plan input select process-local Parquet files outside intended 
dataset roots.
   
   This change makes local file access during physical plan import explicit 
instead of ambient.
   
   ## What changes are included in this PR?
   
   - Adds `PhysicalPlanConsumerOptions` for Substrait physical plan import.
   - Keeps `from_substrait_rel` as a default-deny wrapper for local file 
imports.
   - Adds `from_substrait_rel_with_options` so callers can opt in with allowed 
local file roots.
   - Canonicalizes imported local file paths and configured roots before 
comparing them.
   - Rejects local file globs and folders in physical plan import rather than 
accepting them without a policy.
   - Updates physical roundtrip tests to pass explicit allowed roots.
   - Adds regression coverage for missing allowed roots and paths outside the 
allowed root.
   
   ## Are these changes tested?
   
   Yes.
   
   - `cargo fmt --all --check`
   - `cargo test -p datafusion-substrait`
   - `cargo clippy -p datafusion-substrait --all-targets --all-features -- -D 
warnings`
   - `cargo clippy --all-targets --all-features -- -D warnings`
   
   ## Are there any user-facing changes?
   
   Yes. Substrait physical plan consumers that import `ReadRel.LocalFiles` now 
need to call `from_substrait_rel_with_options` and explicitly configure allowed 
local file roots. The existing `from_substrait_rel` API no longer imports local 
files by default.
   
   This is an intentional security hardening change. It may require the `api 
change` label because it changes behavior for the existing public API and adds 
a new opt-in API.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to