asubiotto opened a new issue, #21194:
URL: https://github.com/apache/datafusion/issues/21194
## Describe the bug
`GROUP BY` on a `RunEndEncoded(Int32, Dictionary(UInt32, Utf8))` column
fails with:
```
Arrow error: Invalid argument error: column types must match schema types,
expected RunEndEncoded("run_ends": non-null Int32, "values":
Dictionary(UInt32, Utf8))
but found RunEndEncoded("run_ends": non-null Int32, "values": Utf8) at
column index 0
```
## To Reproduce
```sql
SELECT group_col, SUM(value) as total
FROM t
GROUP BY group_col
```
Where `group_col` has type `RunEndEncoded(Int32, Dictionary(UInt32, Utf8))`.
## Expected behavior
The query should execute successfully, grouping by the REE column values.
## Additional context
`GroupValuesRows` uses `RowConverter` to roundtrip group values. The
`RowConverter` strips Dictionary encoding from inside the REE values, producing
`REE(Utf8)` on output. The `dictionary_encode_if_necessary` function in
`row.rs` handles re-encoding for `Dictionary`, `Struct`, and `List` types but
has no `RunEndEncoded` arm, so the stripped array passes through unchanged and
`RecordBatch::try_new` rejects the type mismatch.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]