swgillespie opened a new issue, #7824:
URL: https://github.com/apache/arrow-datafusion/issues/7824
### Describe the bug
If you load a Parquet file that has a column of type `Map`, you can't write
a query involving `GetIndexedField` that queries it. This would appear to be
because `GetIndexedField` only specifically supports structs and lists and not
maps.
### To Reproduce
```
DataFusion CLI v31.0.0
❯ create external table test stored as parquet location '../scratch';
0 rows in set. Query took 0.014 seconds.
❯ show columns from test;
+---------------+--------------+------------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| table_catalog | table_schema | table_name | column_name | data_type
| is_nullable |
+---------------+--------------+------------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| datafusion | public | test | ints | Map(Field {
name: "entries", data_type: Struct([Field { name: "key", data_type: Utf8,
nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field {
name: "value", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered:
false, metadata: {} }]), nullable: false, dict_id: 0, dict_is_ordered: false,
metadata: {} }, false) | NO |
| datafusion | public | test | strings | Map(Field {
name: "entries", data_type: Struct([Field { name: "key", data_type: Utf8,
nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field {
name: "value", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered:
false, metadata: {} }]), nullable: false, dict_id: 0, dict_is_ordered: false,
metadata: {} }, false) | NO |
| datafusion | public | test | timestamp | Utf8
| NO |
+---------------+--------------+------------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
❯ select avg(ints['bytes']), strings['method'] from test group by
strings['method'];
Error during planning: The expression to get an indexed field is only valid
for `List` or `Struct` types, got Map(Field { name: "entries", data_type:
Struct([Field { name: "key", data_type: Utf8, nullable: false, dict_id: 0,
dict_is_ordered: false, metadata: {} }, Field { name: "value", data_type:
Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }]),
nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, false)
```
### Expected behavior
I would expect the above query
```sql
SELECT avg(ints['bytes']), strings['method']
FROM test
GROUP BY strings['method'];
```
to work and produce a result set with two columns.
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]