Kimahriman opened a new issue, #841:
URL: https://github.com/apache/datafusion-comet/issues/841

   ### Describe the bug
   
   Discovered this working on new array functions. DataFusion's `make_array` 
doesn't place nice with underlying Dictionary types. Kinda yet another issue 
related to https://github.com/apache/datafusion/issues/11513 imo. 
   
   ### Steps to reproduce
   
   Two separate cases that are easy to recreate with existing unit test setup:
   
   ```
   checkSparkAnswerAndOperator(df.select(array(col("_13"), col("_13"))))
   ```
   produces
   ```
     org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
in stage 11.0 failed 1 times, most recent failure: Lost task 0.0 in stage 11.0 
(TID 11) (10.10.0.29 executor driver): org.apache.comet.CometNativeException: 
Invalid argument error: column types must match schema types, expected 
List(Field { name: "item", data_type: Utf8, nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} }) but found List(Field { name: "item", 
data_type: Dictionary(Int32, Utf8), nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} }) at column index 0
   ```
   
   Letting DataFusion infer the return type instead of specifying it results in
   ```
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 11.0 failed 1 times, most recent failure: Lost task 0.0 in stage 11.0 
(TID 11) (10.10.0.29 executor driver): java.lang.NullPointerException: Cannot 
invoke 
"org.apache.comet.shaded.arrow.vector.dictionary.DictionaryProvider.lookup(long)"
 because "dictionaryProvider" is null
   ```
   which seems like an internal Comet issue? Haven't dug into this but 
presumably is fixable.
   
   And doing a mixed dictionary/non-dictonary like
   ```
   checkSparkAnswerAndOperator(df.select(array(col("_8"), col("_13"))))
   ```
   produces
   ```
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 8.0 failed 1 times, most recent failure: Lost task 0.0 in stage 8.0 (TID 
8) (10.10.0.29 executor driver): org.apache.comet.CometNativeException: 
assertion `left == right` failed: Arrays with inconsistent types passed to 
MutableArrayData
     left: Utf8
    right: Dictionary(Int32, Utf8)
   ```
   in datafusion/functions-nested/src/make_array.rs:231
   
   
   ### Expected behavior
   
   Not sure what the expected behavior or fix is. Either implement this 
function from scratch with better dictionary handling, or add some wrapper 
around invoking the UDF to flatten dictionary encoded arrays
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to