dict-encoded strings fails

via GitHub Wed, 19 Apr 2023 06:27:09 -0700


crepererum opened a new issue, #6057:
URL: https://github.com/apache/arrow-datafusion/issues/6057


   ### Describe the bug
   
   Piping a dictionary encoded string column through `make_array` and `unnest` 
fails with:
   
   `ArrowError(InvalidArgumentError("column types must match schema types, 
expected Dictionary(Int32, Utf8) but found Utf8 at column index 0"))`
   
   This might either be an DataFusion bug or a bug in arrow-rs.
   
   ### To Reproduce
   
   ```rust
   #[tokio::test]
   async fn test_unnest_broken() {
       let ctx = SessionContext::new();
   
       // this works:
       //    let v = lit("x");
       let v = lit(ScalarValue::Dictionary(
           Box::new(DataType::Int32),
           Box::new(ScalarValue::new_utf8("x")),
       ));
   
       let plan = LogicalPlanBuilder::values(vec![vec![v]])
       .unwrap()
       .project([Expr::ScalarFunction {
           fun: BuiltinScalarFunction::MakeArray,
           args: vec![col("column1")],
       }
       .alias("array")])
       .unwrap()
       .unnest_column("array")
       .unwrap()
       .build()
       .unwrap();
   
       let df = ctx.execute_logical_plan(plan).await.unwrap();
   
       df.collect().await.unwrap();
   }
   ```
   
   ### Expected behavior
   
   It just works like w/ "normal" strings (i.e. when not using dictionary 
encoding).
   
   ### Additional context
   
   Tested on 9798fbcab1e04123f05e7692475e104310e0473a .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] crepererum opened a new issue, #6057: `make_array` -> `unnest` w/ dict-encoded strings fails

Reply via email to