asubiotto opened a new issue, #21194:
URL: https://github.com/apache/datafusion/issues/21194

   ## Describe the bug
   
   `GROUP BY` on a `RunEndEncoded(Int32, Dictionary(UInt32, Utf8))` column 
fails with:
   
   ```
   Arrow error: Invalid argument error: column types must match schema types,
   expected RunEndEncoded("run_ends": non-null Int32, "values": 
Dictionary(UInt32, Utf8))
   but found RunEndEncoded("run_ends": non-null Int32, "values": Utf8) at 
column index 0
   ```
   
   ## To Reproduce
   
   ```sql
   SELECT group_col, SUM(value) as total
   FROM t
   GROUP BY group_col
   ```
   
   Where `group_col` has type `RunEndEncoded(Int32, Dictionary(UInt32, Utf8))`.
   
   ## Expected behavior
   
   The query should execute successfully, grouping by the REE column values.
   
   ## Additional context
   
   `GroupValuesRows` uses `RowConverter` to roundtrip group values. The 
`RowConverter` strips Dictionary encoding from inside the REE values, producing 
`REE(Utf8)` on output. The `dictionary_encode_if_necessary` function in 
`row.rs` handles re-encoding for `Dictionary`, `Struct`, and `List` types but 
has no `RunEndEncoded` arm, so the stripped array passes through unchanged and 
`RecordBatch::try_new` rejects the type mismatch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to