comphead opened a new issue, #22082:
URL: https://github.com/apache/datafusion/issues/22082

   ### Is your feature request related to a problem or challenge?
   
   ## Is your feature request related to a problem or challenge?                
                                                                                
                                            
                                                               
     DataFusion's `lag` and `lead` window functions only accept a **scalar** 
value as the third (default) argument. The implementation in 
`datafusion/functions-window/src/lead_lag.rs` extracts the default  
     at planning time via `get_scalar_value_from_args(input_exprs, 2)` and 
stores it as a `ScalarValue` on the `WindowUDFFieldArgs` / partition evaluator 
state.
                                                                                
                                                                                
                                              
     This means queries like:                                  
   
     ```sql
     SELECT
       b,
       LAG(a, 1, c)  OVER (ORDER BY b) AS lg,
       LEAD(a, 1, c) OVER (ORDER BY b) AS ld                                    
                                                                                
                                              
     FROM t     
   ```                                                                          
                                                                                
                                    
                                                                                
                                                                                
                                              
     …where the default expression is a column reference (or any non-literal 
expression), cannot be planned natively in DataFusion. Spark accepts an 
arbitrary Expression here — the value is meant to be     
     evaluated per row when the offset row does not exist — and downstream 
projects building Spark compatibility on DataFusion (e.g. Apache DataFusion 
Comet) currently have to fall back to Spark for this
     pattern.                                                                   
                                                                                
                                              
                                                                  
     Allow the third argument of lag / lead to be any expression, evaluated per 
row in the partition evaluator:                                                 
                                              
                                                               
     1. In parse_default_value / partition-evaluator construction, accept an 
arbitrary PhysicalExpr for the default rather than coercing it to a 
ScalarValue.                                                 
     2. In evaluate_all / evaluate_all_with_ignore_null (and the 
shift_with_default_value path), when the offset row does not exist, take the 
default from the per-row evaluation of the default-expression
     column instead of cloning a single ScalarValue.                            
                                                                                
                                              
     3. Preserve the fast path: when the default expression is a literal (the 
common case today), continue to materialize it once as a scalar to avoid 
overhead.
     4. Update the field/return-type derivation in field so the 
default-expression's data type still drives result-type unification (the 
existing NULL_FIELD fallback already covers the Literal(NULL)        
     default).                                                                  
                                                                                
                                              
                                                                                
                                                                                
                                              
   
   ### Describe the solution you'd like
   
   _No response_
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to