timsaucer opened a new issue, #13017:
URL: https://github.com/apache/datafusion/issues/13017

   ### Describe the bug
   
   In the following example, if we have one window function that depends upon 
another window function, we cannot do these in a single step. However if we 
break the operation into two steps, it succeeds. You can see in the trivial 
example, by doing a `select` operation after the first window operation we get 
the desired result.
   
   The difficulty here is that when using the DataFrame API it is common to 
build up a set of library functions that should be able to take any kind of 
expression. The window expression should be valid input to other window 
functions. If we do not support this, then the end user needs to track down 
places where their library function is returning a window function expression 
and force a select operation on the DataFrame. This is particularly difficult 
when building libraries that generate large chains of operations.
   
   ### To Reproduce
   
   ```
       #[tokio::test]
       async fn window_over_window() -> Result<()> {
           use datafusion_common::record_batch;
           use datafusion_common::create_array;
           use datafusion_functions_aggregate::min_max::max_udaf;
           let ctx = SessionContext::new();
           let _ = ctx.register_batch("t", record_batch!(("a", Int32, vec![1, 
2, 3]))?);
           let df = ctx.table("t").await?;
   
           let max_of_col = Expr::WindowFunction(WindowFunction::new(
               WindowFunctionDefinition::AggregateUDF(max_udaf()),
               vec![col("row_num")],
           ));
   
           let max_of_window = Expr::WindowFunction(WindowFunction::new(
               WindowFunctionDefinition::AggregateUDF(max_udaf()),
               vec![row_number()],
           ));
   
           let passing_df = 
df.clone().select(vec![row_number().alias("row_num")])?.select(vec![max_of_col])?;
   
           passing_df.show().await?;
   
           let failing_df = df.select(vec![max_of_window])?;
   
           failing_df.show().await?;
   
           Ok(())
       }
   ```
   
   ### Expected behavior
   
   These two approaches should yield identical results.
   
   ### Additional context
   
   This is a trivial example, but I have an actual use case that this is based 
upon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to