timsaucer opened a new issue, #10328:
URL: https://github.com/apache/datafusion/issues/10328

   ### Describe the bug
   
   When you use the lead or lag built in functions and the data type is either 
a list or struct, you will get a panic with error `Exception: Arrow error: 
Compute error: concat requires input of at least one array`
   
   I have root caused this to `list_to_array_of_size` in 
`datafusion/common/src/scalar/mod.rs` where we do not check to see if the 
arrays we are attempting to concat have any contents, which they will not 
because in `WindowAggState::new()` we are calling `to_array_of_size(0)`. These 
calls work for primitive data, but for list data we need an additional check. I 
am submitting a PR to resolve the issue.
   
   ### To Reproduce
   
   Data file is a simple csv:
   ```
   a,b,c
   1,2,3
   4,5,6
   7,8,9
   10,11,12
   ```
   
   Code to reproduce:
   
   ```
   use datafusion::{logical_expr::{expr::WindowFunction, BuiltInWindowFunction, 
WindowFrame, WindowFunctionDefinition}, prelude::*};
   
   #[tokio::main]
   async fn main() -> datafusion::error::Result<()> {
   
       let ctx = SessionContext::new();
       let mut df = 
ctx.read_csv("/Users/tsaucer/working/testing_ballista/lead_lag/example.csv", 
CsvReadOptions::default()).await?;
   
       df = df.with_column("array_col", make_array(vec![col("a"), col("b"), 
col("c")]))?;
   
       df.clone().show().await?;
   
       let lag_expr = Expr::WindowFunction(WindowFunction::new(
           WindowFunctionDefinition::BuiltInWindowFunction(
               BuiltInWindowFunction::Lead,
           ),
           vec![col("array_col")],
           vec![],
           vec![],
           WindowFrame::new(None),
           None,
       ));
   
       df = df.select(vec![col("a"), col("b"), col("c"), col("array_col"), 
lag_expr.alias("lagged")])?;
   
       df.show().await?;
   
       Ok(())
   }
   ```
   
   Results:
   ```
   +----+----+----+--------------+
   | a  | b  | c  | array_col    |
   +----+----+----+--------------+
   | 1  | 2  | 3  | [1, 2, 3]    |
   | 4  | 5  | 6  | [4, 5, 6]    |
   | 7  | 8  | 9  | [7, 8, 9]    |
   | 10 | 11 | 12 | [10, 11, 12] |
   +----+----+----+--------------+
   Error: ArrowError(ComputeError("concat requires input of at least one 
array"), None)
   ```
   
   ### Expected behavior
   
   Expect lag to work on these structures. Here is output from the PR I will 
put up shortly.
   
   ```
   +----+----+----+--------------+
   | a  | b  | c  | array_col    |
   +----+----+----+--------------+
   | 1  | 2  | 3  | [1, 2, 3]    |
   | 4  | 5  | 6  | [4, 5, 6]    |
   | 7  | 8  | 9  | [7, 8, 9]    |
   | 10 | 11 | 12 | [10, 11, 12] |
   +----+----+----+--------------+
   +----+----+----+--------------+--------------+
   | a  | b  | c  | array_col    | lagged       |
   +----+----+----+--------------+--------------+
   | 1  | 2  | 3  | [1, 2, 3]    | [4, 5, 6]    |
   | 4  | 5  | 6  | [4, 5, 6]    | [7, 8, 9]    |
   | 7  | 8  | 9  | [7, 8, 9]    | [10, 11, 12] |
   | 10 | 11 | 12 | [10, 11, 12] |              |
   +----+----+----+--------------+--------------+
   ```
   
   ### Additional context
   
   This is the root cause for 
https://github.com/apache/datafusion-python/issues/647


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to