timsaucer opened a new issue, #13017:
URL: https://github.com/apache/datafusion/issues/13017
### Describe the bug
In the following example, if we have one window function that depends upon
another window function, we cannot do these in a single step. However if we
break the operation into two steps, it succeeds. You can see in the trivial
example, by doing a `select` operation after the first window operation we get
the desired result.
The difficulty here is that when using the DataFrame API it is common to
build up a set of library functions that should be able to take any kind of
expression. The window expression should be valid input to other window
functions. If we do not support this, then the end user needs to track down
places where their library function is returning a window function expression
and force a select operation on the DataFrame. This is particularly difficult
when building libraries that generate large chains of operations.
### To Reproduce
```
#[tokio::test]
async fn window_over_window() -> Result<()> {
use datafusion_common::record_batch;
use datafusion_common::create_array;
use datafusion_functions_aggregate::min_max::max_udaf;
let ctx = SessionContext::new();
let _ = ctx.register_batch("t", record_batch!(("a", Int32, vec![1,
2, 3]))?);
let df = ctx.table("t").await?;
let max_of_col = Expr::WindowFunction(WindowFunction::new(
WindowFunctionDefinition::AggregateUDF(max_udaf()),
vec![col("row_num")],
));
let max_of_window = Expr::WindowFunction(WindowFunction::new(
WindowFunctionDefinition::AggregateUDF(max_udaf()),
vec![row_number()],
));
let passing_df =
df.clone().select(vec![row_number().alias("row_num")])?.select(vec![max_of_col])?;
passing_df.show().await?;
let failing_df = df.select(vec![max_of_window])?;
failing_df.show().await?;
Ok(())
}
```
### Expected behavior
These two approaches should yield identical results.
### Additional context
This is a trivial example, but I have an actual use case that this is based
upon.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]