bubulalabu commented on PR #18535:
URL: https://github.com/apache/datafusion/pull/18535#issuecomment-3579311821
Thanks @alamb! Let me clarify
#### What is LATERAL?
LATERAL allows the right side of a join to reference columns from the left
side. For example:
```sql
-- Without LATERAL this fails - t.x isn't visible to generate_series
SELECT * FROM t, generate_series(1, t.x);
-- With LATERAL it works
SELECT * FROM t CROSS JOIN LATERAL generate_series(1, t.x);
```
For each row in `t`, the function gets called with that row's `x` value.
#### What this PR adds
This PR makes LATERAL work specifically for table functions. The problem is
that the current `TableFunctionImpl` trait only accepts constant expressions at
planning time:
```rust
fn call(&self, args: &[Expr]) -> Result<Arc<dyn TableProvider>>
```
That signature can't support LATERAL where the arguments are column
references that vary per input row.
This PR introduces a new `BatchedTableFunctionImpl` trait that receives
already-evaluated arrays:
```rust
async fn invoke_batch(&self, args: &[ArrayRef]) -> Result<Stream<Chunk>>
```
The key idea is to process multiple rows in a single function call instead
of calling the function once per row. For example, if you have 3 input rows,
instead of calling the function 3 times, you call it once with arrays of length
3:
```rust
invoke_batch(&[
Int64Array([1, 5, 10]), // start values from 3 rows
Int64Array([3, 7, 12]) // end values from 3 rows
])
```
The function returns chunks with explicit row mapping so the executor knows
which output rows came from which input rows.
#### How this compares to DuckDB
DuckDB handles LATERAL differently. Their table functions don't know
anything about LATERAL - they just get called with values. The magic happens in
the optimizer which tries to "decorrelate" LATERAL joins into hash joins when
possible, falling back to nested loops when it can't.
This PR takes a different approach where table functions are explicitly
LATERAL-aware through the batched API. There's no decorrelation optimization
yet, so it always uses a batched nested loop execution strategy. But the
batched API could support adding DuckDB-style decorrelation later as an
optimizer pass.
#### Relationship to LATERAL subqueries
This PR doesn't help with LATERAL subqueries - those still fail during
physical planning. This is only for table functions. Though the patterns here
(batched execution, explicit correlation tracking) might inform future work on
LATERAL subqueries.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]