kosiew opened a new pull request, #19407:
URL: https://github.com/apache/datafusion/pull/19407

   ## Which issue does this PR close?
   
   * Closes #16279.
   
   ## Rationale for this change
   
   Substrait round-trip tests and ~35 sqllogictest cases were failing with 
errors like **"No table named 'generate_series()'"** when deserializing a 
Substrait plan back into a DataFusion logical plan.
   
   The root cause is that table function invocations (e.g. `generate_series`, 
`range`) were being serialized as ordinary `ReadRel::NamedTable` scans, so the 
consumer attempted to resolve a table named like the function call rather than 
reconstructing a table function scan.
   
   This change adds a stable mechanism to preserve table-function intent and 
evaluated arguments in the Substrait plan, enabling correct reconstruction 
during consumption and eliminating the "No table named ..." failures.
   
   ## What changes are included in this PR?
   
   * **Catalog API extension:**
   
     * Added `TableProvider::table_function_details()` (default `None`) and a 
`TableFunctionDetails` struct carrying:
   
       * function name (`&'static str`)
       * evaluated argument literals (`&[ScalarValue]`)
     * Kept a convenience `table_function_name()` helper that delegates to 
`table_function_details()`.
   
   * **Expose shared scalar constant:**
   
     * Made `datafusion_common::scalar::NANOS_PER_DAY` public and reused it in 
date handling to avoid local redefinition.
   
   * **Built-in table function provider updates:**
   
     * Updated `GenerateSeriesTable` to cache evaluated arguments and to 
implement `table_function_details()` so the producer can serialize table 
function calls without downcasting.
     * Refactored argument parsing to consistently treat `NULL` / missing 
inputs and to always represent table function calls with a complete 3-argument 
list (start, end, step) where applicable.
   
   * **Substrait producer support (logical plan):**
   
     * Added a `ReadRel.advanced_extension` payload when a `TableScan` 
originates from a table function provider.
     * Introduced a dedicated protobuf message `TableFunctionReadRelExtension` 
encoded as `google.protobuf.Any` using a shared `type_url` constant:
   
       * `type.googleapis.com/datafusion.substrait.TableFunctionReadRel`
     * This payload records the table function name and **evaluated** argument 
literals (after defaults/coercions).
   
   * **Substrait consumer support (logical plan):**
   
     * Added parsing of the `ReadRel.advanced_extension` payload and 
reconstruction of the table function scan by:
   
       * looking up a registered table function by name 
(`SubstraitConsumer::get_table_function`)
       * calling `create_table_provider` with literal `Expr` arguments
     * Added a `provider_override` path so a reconstructed provider can be used 
even when the `NamedTable` reference would not resolve.
   
   * **Code organization:**
   
     * Moved duplicated VirtualTable literal conversion helpers into 
`logical_plan::utils` (producer + consumer), reducing duplication and improving 
maintainability.
   
   * **Dependencies:**
   
     * Added `datafusion-functions-table` dependency to `datafusion-substrait` 
so built-in table function behavior (and tests) are available in round-trip 
contexts.
   
   * **Tests:**
   
     * Added a producer unit test ensuring `generate_series` table scans emit 
the expected advanced_extension payload and argument normalization.
     * Added an integration test `table_function_roundtrip.rs` validating 
round-trip correctness for:
   
       * `generate_series(1, 4)`
       * `range(2, 8, 3)`
       * `generate_series(NULL, 5)`
   
   ## Are these changes tested?
   
   ### Before
   ```
   ❯ cargo test --test sqllogictests -- --substrait-round-trip array.slt:6472
       Finished `test` profile [unoptimized + debuginfo] target(s) in 0.78s
        Running bin/sqllogictests.rs 
(target/debug/deps/sqllogictests-3a351d3176e1df99)
   Completed 2 test files in 1 second                                           
        External error: 1 errors in file 
/Users/kosiew/GitHub/datafusion/datafusion/sqllogictest/test_files/array.slt
   
   1. query failed: DataFusion error: Error during planning: No table named 
'generate_series()'
   ```
   
   ### After
   ```
   ❯ cargo test --test sqllogictests -- --substrait-round-trip array.slt:6472
       Finished `test` profile [unoptimized + debuginfo] target(s) in 0.58s
        Running bin/sqllogictests.rs 
(target/debug/deps/sqllogictests-3a351d3176e1df99)
   Completed 2 test files in 1 second
   ``` 
   
   Yes.
   
   * Added a producer-side unit test that verifies the 
`ReadRel.advanced_extension` is present, uses the expected type URL, and 
encodes normalized arguments for `generate_series(1, 3)`.
   * Added new Substrait round-trip integration tests that serialize → 
deserialize → execute and compare results for `generate_series` and `range`, 
including a NULL-argument case.
   
   These tests exercise both the new producer metadata emission path and the 
new consumer reconstruction path, covering the scenario that caused the 
sqllogictest failures.
   
   ## Are there any user-facing changes?
   
   * **Improved Substrait round-trip behavior** for queries using built-in 
table functions like `generate_series` and `range`. Previously, such plans 
could fail to round-trip with "No table named ..." errors; with this change 
they can be reconstructed and executed correctly.
   
   * **New (optional) API hook for custom table providers:**
   
     * `TableProvider::table_function_details()` allows custom table-function 
providers to participate in Substrait serialization without needing 
producer-side downcasting.
   
   If this new method is considered a public API surface change, the PR should 
carry the **`api change`** label.
   
   ## LLM-generated code disclosure
   
   This PR includes LLM-generated code and comments. All LLM-generated content 
has been manually reviewed and tested.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to