evangelisilva opened a new pull request, #20376: URL: https://github.com/apache/datafusion/pull/20376
# UDTF Argument Coercion Suppression ## Which issue does this PR close? Closes #20293. ## Rationale for this change Currently, User-Defined Table Functions (UDTFs) in DataFusion automatically undergo argument coercion and simplification before being passed to the function creator. This process happens against an empty schema (`DFSchema::empty()`). If a UDTF uses arguments that contain identifiers (e.g., [scan_with(index=['a', 'b'])](cci:1://file:///Users/evangelisilva/Documents/datafusion/datafusion/catalog/src/table.rs:173:4-202:5)), the simplifier fails with a `Schema error: No field named index` because it attempts to resolve [index](cci:1://file:///Users/evangelisilva/Documents/datafusion/datafusion/datasource-parquet/src/opener.rs:971:0-1000:1) as a column reference. This prevents UDTFs from implementing custom argument parsing logic that relies on identifiers or complex expressions. ## What changes are included in this PR? 1. **Modified [TableFunctionImpl](cci:2://file:///Users/evangelisilva/Documents/datafusion/datafusion/catalog/src/table.rs:488:0-497:1) trait**: Added a new method [coerce_arguments(&self) -> bool](cci:1://file:///Users/evangelisilva/Documents/datafusion/datafusion/catalog/src/table.rs:529:4-532:5) that defaults to [true](cci:1://file:///Users/evangelisilva/Documents/datafusion/datafusion/physical-expr/src/expressions/binary.rs:5218:4-5240:5). This allows UDTF authors to opt-out of automatic coercion. 2. **Updated [TableFunction](cci:2://file:///Users/evangelisilva/Documents/datafusion/datafusion/catalog/src/table.rs:501:0-506:1) struct**: Exposed [coerce_arguments](cci:1://file:///Users/evangelisilva/Documents/datafusion/datafusion/catalog/src/table.rs:529:4-532:5) on the wrapper struct. 3. **Updated [SessionContextProvider](cci:2://file:///Users/evangelisilva/Documents/datafusion/datafusion/core/src/execution/session_state.rs:1792:0-1795:1)**: Modified the SQL planner integration in [session_state.rs](cci:7://file:///Users/evangelisilva/Documents/datafusion/datafusion/core/src/execution/session_state.rs:0:0-0:0) to check the [coerce_arguments](cci:1://file:///Users/evangelisilva/Documents/datafusion/datafusion/catalog/src/table.rs:529:4-532:5) flag. If `false`, the raw [Expr](cci:2://file:///Users/evangelisilva/Documents/datafusion/datafusion/physical-expr/src/expressions/cast.rs:62:0-69:1) arguments are passed directly to the UDTF creator without modification. 4. **Unit Tests**: Added comprehensive tests in [session_state.rs](cci:7://file:///Users/evangelisilva/Documents/datafusion/datafusion/core/src/execution/session_state.rs:0:0-0:0) to verify both the default behavior (automatic coercion/failure on identifiers) and the new suppressed behavior (allowing identifiers). ## Are these changes tested? Yes. I've added a new test module `udtf_tests` in [datafusion/core/src/execution/session_state.rs](cci:7://file:///Users/evangelisilva/Documents/datafusion/datafusion/core/src/execution/session_state.rs:0:0-0:0) containing: - [test_udtf_no_coercion](cci:1://file:///Users/evangelisilva/Documents/datafusion/datafusion/core/src/execution/session_state.rs:2578:4-2609:5): Verifies that identifiers survive when coercion is disabled. - [test_udtf_default_coercion](cci:1://file:///Users/evangelisilva/Documents/datafusion/datafusion/core/src/execution/session_state.rs:2611:4-2654:5): Verifies that the existing behavior (failing on identifiers) is preserved by default to ensure no regressions. ## Are there any user-facing changes? Yes. There is a new method on the [TableFunctionImpl](cci:2://file:///Users/evangelisilva/Documents/datafusion/datafusion/catalog/src/table.rs:488:0-497:1) trait. However, because it has a default implementation that returns [true](cci:1://file:///Users/evangelisilva/Documents/datafusion/datafusion/physical-expr/src/expressions/binary.rs:5218:4-5240:5), it is **backward compatible** and will not break existing UDTF implementations. UDTF authors who need the new behavior simply need to override this method. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
