Re: [PR] Convert nth_value builtIn function to UDWF [datafusion]

via GitHub Mon, 04 Nov 2024 19:28:50 -0800


jcsherin commented on PR #13201:
URL: https://github.com/apache/datafusion/pull/13201#issuecomment-2456154778


   > I think it would be better to update WindowUDFImpl in a followup PR for 
enhancement right? I can skip this test case in the scope of this PR. Correct 
me if I'm wrong please
   
   Sure, we can improve the API in another PR.
   
   Here is a workaround that fixes the failing test:
   ```rust
   // In datafusion/physical-plan/src/windows/mod.rs
   fn create_udwf_window_expr(
       fun: &Arc<WindowUDF>,
       args: &[Arc<dyn PhysicalExpr>],
       input_schema: &Schema,
       name: String,
       ignore_nulls: bool,
   ) -> Result<Arc<dyn BuiltInWindowFunctionExpr>> {
       // need to get the types into an owned vec for some reason
       let input_types: Vec<_> = args
           .iter()
           .map(|arg| arg.data_type(input_schema))
           .collect::<Result<_>>()?;
   
       let udwf_expr =
           Arc::new(WindowUDFExpr {
               fun: Arc::clone(fun),
               args: args.to_vec(),
               input_types,
               name,
               is_reversed: false,
               ignore_nulls,
           });
   
       /// Early validation of input expressions
       ///
       /// We create a partition evaluator because in the user-defined window
       /// implementation this is where code for parsing input expressions
       /// exist. The benefits are:
       /// - If any of the input expressions are invalid we catch them early
       /// in the planning phase, rather than during execution.
       /// - Maintains compatibility with built-in (now removed) window
       /// functions validation behavior.
       /// - Predictable and reliable error handling.
       ///
       /// See discussion here:
       /// 
https://github.com/apache/datafusion/pull/13201#issuecomment-2454209975
       let _ = udwf_expr.create_evaluator()?;
   
       Ok(udwf_expr)
   }
   ```
   
   I verified that this works in your branch.
   ```sql
   DataFusion CLI v42.2.0
   > CREATE TABLE t1(v1 BIGINT);
   0 row(s) fetched.
   Elapsed 0.019 seconds.
   
   > EXPLAIN SELECT NTH_VALUE('+Inf'::Double, v1) OVER (PARTITION BY v1) FROM 
t1;
   
+--------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type    | plan                                                        
                                                                                
                                                                                
                             |
   
+--------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | logical_plan | Projection: nth_value(Utf8("+Inf"),t1.v1) PARTITION BY 
[t1.v1] ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING                
                                                                                
                                  |
   |              |   WindowAggr: windowExpr=[[nth_value(Float64(inf), t1.v1) 
PARTITION BY [t1.v1] ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING 
AS nth_value(Utf8("+Inf"),t1.v1) PARTITION BY [t1.v1] ROWS BETWEEN UNBOUNDED 
PRECEDING AND UNBOUNDED FOLLOWING]] |
   |              |     TableScan: t1 projection=[v1]                           
                                                                                
                                                                                
                             |
   
+--------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   1 row(s) fetched.
   Elapsed 0.018 seconds.
   
   > SELECT NTH_VALUE('+Inf'::Double, v1) OVER (PARTITION BY v1) FROM t1;
   This feature is not implemented: There is only support Literal types for 
field at idx: 1 in Window Function
   ```
   
   This workaround may not be ideal, but at least we do not need to skip a test 
here. Also please feel free to update the code/comments as you see fit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Convert nth_value builtIn function to UDWF [datafusion]

Reply via email to