Kimahriman opened a new issue, #841: URL: https://github.com/apache/datafusion-comet/issues/841
### Describe the bug Discovered this working on new array functions. DataFusion's `make_array` doesn't place nice with underlying Dictionary types. Kinda yet another issue related to https://github.com/apache/datafusion/issues/11513 imo. ### Steps to reproduce Two separate cases that are easy to recreate with existing unit test setup: ``` checkSparkAnswerAndOperator(df.select(array(col("_13"), col("_13")))) ``` produces ``` org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 11.0 failed 1 times, most recent failure: Lost task 0.0 in stage 11.0 (TID 11) (10.10.0.29 executor driver): org.apache.comet.CometNativeException: Invalid argument error: column types must match schema types, expected List(Field { name: "item", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) but found List(Field { name: "item", data_type: Dictionary(Int32, Utf8), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) at column index 0 ``` Letting DataFusion infer the return type instead of specifying it results in ``` org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 11.0 failed 1 times, most recent failure: Lost task 0.0 in stage 11.0 (TID 11) (10.10.0.29 executor driver): java.lang.NullPointerException: Cannot invoke "org.apache.comet.shaded.arrow.vector.dictionary.DictionaryProvider.lookup(long)" because "dictionaryProvider" is null ``` which seems like an internal Comet issue? Haven't dug into this but presumably is fixable. And doing a mixed dictionary/non-dictonary like ``` checkSparkAnswerAndOperator(df.select(array(col("_8"), col("_13")))) ``` produces ``` org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 8.0 failed 1 times, most recent failure: Lost task 0.0 in stage 8.0 (TID 8) (10.10.0.29 executor driver): org.apache.comet.CometNativeException: assertion `left == right` failed: Arrays with inconsistent types passed to MutableArrayData left: Utf8 right: Dictionary(Int32, Utf8) ``` in datafusion/functions-nested/src/make_array.rs:231 ### Expected behavior Not sure what the expected behavior or fix is. Either implement this function from scratch with better dictionary handling, or add some wrapper around invoking the UDF to flatten dictionary encoded arrays ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org