bubulalabu commented on PR #18535:
URL: https://github.com/apache/datafusion/pull/18535#issuecomment-3579311821

   Thanks @alamb! Let me clarify
   
   #### What is LATERAL?
   
   LATERAL allows the right side of a join to reference columns from the left 
side. For example:
   
   ```sql
   -- Without LATERAL this fails - t.x isn't visible to generate_series
   SELECT * FROM t, generate_series(1, t.x);
   
   -- With LATERAL it works
   SELECT * FROM t CROSS JOIN LATERAL generate_series(1, t.x);
   ```
   
   For each row in `t`, the function gets called with that row's `x` value.
   
   #### What this PR adds
   
   This PR makes LATERAL work specifically for table functions. The problem is 
that the current `TableFunctionImpl` trait only accepts constant expressions at 
planning time:
   
   ```rust
   fn call(&self, args: &[Expr]) -> Result<Arc<dyn TableProvider>>
   ```
   
   That signature can't support LATERAL where the arguments are column 
references that vary per input row.
   
   This PR introduces a new `BatchedTableFunctionImpl` trait that receives 
already-evaluated arrays:
   
   ```rust
   async fn invoke_batch(&self, args: &[ArrayRef]) -> Result<Stream<Chunk>>
   ```
   
   The key idea is to process multiple rows in a single function call instead 
of calling the function once per row. For example, if you have 3 input rows, 
instead of calling the function 3 times, you call it once with arrays of length 
3:
   
   ```rust
   invoke_batch(&[
       Int64Array([1, 5, 10]),  // start values from 3 rows
       Int64Array([3, 7, 12])   // end values from 3 rows
   ])
   ```
   
   The function returns chunks with explicit row mapping so the executor knows 
which output rows came from which input rows.
   
   #### How this compares to DuckDB
   
   DuckDB handles LATERAL differently. Their table functions don't know 
anything about LATERAL - they just get called with values. The magic happens in 
the optimizer which tries to "decorrelate" LATERAL joins into hash joins when 
possible, falling back to nested loops when it can't.
   
   This PR takes a different approach where table functions are explicitly 
LATERAL-aware through the batched API. There's no decorrelation optimization 
yet, so it always uses a batched nested loop execution strategy. But the 
batched API could support adding DuckDB-style decorrelation later as an 
optimizer pass.
   
   #### Relationship to LATERAL subqueries
   
   This PR doesn't help with LATERAL subqueries - those still fail during 
physical planning. This is only for table functions. Though the patterns here 
(batched execution, explicit correlation tracking) might inform future work on 
LATERAL subqueries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to