kosiew opened a new pull request, #19407:
URL: https://github.com/apache/datafusion/pull/19407
## Which issue does this PR close?
* Closes #16279.
## Rationale for this change
Substrait round-trip tests and ~35 sqllogictest cases were failing with
errors like **"No table named 'generate_series()'"** when deserializing a
Substrait plan back into a DataFusion logical plan.
The root cause is that table function invocations (e.g. `generate_series`,
`range`) were being serialized as ordinary `ReadRel::NamedTable` scans, so the
consumer attempted to resolve a table named like the function call rather than
reconstructing a table function scan.
This change adds a stable mechanism to preserve table-function intent and
evaluated arguments in the Substrait plan, enabling correct reconstruction
during consumption and eliminating the "No table named ..." failures.
## What changes are included in this PR?
* **Catalog API extension:**
* Added `TableProvider::table_function_details()` (default `None`) and a
`TableFunctionDetails` struct carrying:
* function name (`&'static str`)
* evaluated argument literals (`&[ScalarValue]`)
* Kept a convenience `table_function_name()` helper that delegates to
`table_function_details()`.
* **Expose shared scalar constant:**
* Made `datafusion_common::scalar::NANOS_PER_DAY` public and reused it in
date handling to avoid local redefinition.
* **Built-in table function provider updates:**
* Updated `GenerateSeriesTable` to cache evaluated arguments and to
implement `table_function_details()` so the producer can serialize table
function calls without downcasting.
* Refactored argument parsing to consistently treat `NULL` / missing
inputs and to always represent table function calls with a complete 3-argument
list (start, end, step) where applicable.
* **Substrait producer support (logical plan):**
* Added a `ReadRel.advanced_extension` payload when a `TableScan`
originates from a table function provider.
* Introduced a dedicated protobuf message `TableFunctionReadRelExtension`
encoded as `google.protobuf.Any` using a shared `type_url` constant:
* `type.googleapis.com/datafusion.substrait.TableFunctionReadRel`
* This payload records the table function name and **evaluated** argument
literals (after defaults/coercions).
* **Substrait consumer support (logical plan):**
* Added parsing of the `ReadRel.advanced_extension` payload and
reconstruction of the table function scan by:
* looking up a registered table function by name
(`SubstraitConsumer::get_table_function`)
* calling `create_table_provider` with literal `Expr` arguments
* Added a `provider_override` path so a reconstructed provider can be used
even when the `NamedTable` reference would not resolve.
* **Code organization:**
* Moved duplicated VirtualTable literal conversion helpers into
`logical_plan::utils` (producer + consumer), reducing duplication and improving
maintainability.
* **Dependencies:**
* Added `datafusion-functions-table` dependency to `datafusion-substrait`
so built-in table function behavior (and tests) are available in round-trip
contexts.
* **Tests:**
* Added a producer unit test ensuring `generate_series` table scans emit
the expected advanced_extension payload and argument normalization.
* Added an integration test `table_function_roundtrip.rs` validating
round-trip correctness for:
* `generate_series(1, 4)`
* `range(2, 8, 3)`
* `generate_series(NULL, 5)`
## Are these changes tested?
### Before
```
❯ cargo test --test sqllogictests -- --substrait-round-trip array.slt:6472
Finished `test` profile [unoptimized + debuginfo] target(s) in 0.78s
Running bin/sqllogictests.rs
(target/debug/deps/sqllogictests-3a351d3176e1df99)
Completed 2 test files in 1 second
External error: 1 errors in file
/Users/kosiew/GitHub/datafusion/datafusion/sqllogictest/test_files/array.slt
1. query failed: DataFusion error: Error during planning: No table named
'generate_series()'
```
### After
```
❯ cargo test --test sqllogictests -- --substrait-round-trip array.slt:6472
Finished `test` profile [unoptimized + debuginfo] target(s) in 0.58s
Running bin/sqllogictests.rs
(target/debug/deps/sqllogictests-3a351d3176e1df99)
Completed 2 test files in 1 second
```
Yes.
* Added a producer-side unit test that verifies the
`ReadRel.advanced_extension` is present, uses the expected type URL, and
encodes normalized arguments for `generate_series(1, 3)`.
* Added new Substrait round-trip integration tests that serialize →
deserialize → execute and compare results for `generate_series` and `range`,
including a NULL-argument case.
These tests exercise both the new producer metadata emission path and the
new consumer reconstruction path, covering the scenario that caused the
sqllogictest failures.
## Are there any user-facing changes?
* **Improved Substrait round-trip behavior** for queries using built-in
table functions like `generate_series` and `range`. Previously, such plans
could fail to round-trip with "No table named ..." errors; with this change
they can be reconstructed and executed correctly.
* **New (optional) API hook for custom table providers:**
* `TableProvider::table_function_details()` allows custom table-function
providers to participate in Substrait serialization without needing
producer-side downcasting.
If this new method is considered a public API surface change, the PR should
carry the **`api change`** label.
## LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content
has been manually reviewed and tested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]