adragomir opened a new issue, #19943:
URL: https://github.com/apache/datafusion/issues/19943
### Describe the bug
When using deeply nested schemas, for example (duckdb format)
```
CREATE OR REPLACE TABLE raw (
timestamp TIMESTAMP_S, -- 0
web STRUCT(
webPageDetails STRUCT(
name VARCHAR,
pageViews STRUCT(value INT8)
)
),
identityMap MAP(
VARCHAR,
STRUCT(
id VARCHAR,
prim BOOLEAN
)[]
)
);
```
and doing the query
```
SELECT
identityMap['ECID'][1]['id']
FROM
raw
```
The `datafusion_optimizer::analyzer::type_coercion` step transforms the plan
from
```
Projection: get_field(array_element(get_field(raw.identityMap,
Utf8("ECID")), Int64(1)), Utf8("id"))
TableScan: raw
```
to
```
Projection: get_field(array_element(CAST(get_field(raw.identityMap,
Utf8("ECID")) AS List(nullable Struct("id": nullable Utf8, "prim": nullable
Boolean))), Int64(1)), Utf8("id"))
TableScan: raw
```
The reason is that inside the guts of
`coerce_arguments_for_signature_with_scalar_udf` & friends, the type
comparisons are done using equality. But for arrays, there are functions that
return the same types, the only difference being the missing field name
("element", in the case of parquet lists). So we add a cast, even though the
schemas are identical.
Arrow has ```
/// Compares the datatype with another, ignoring nested field names
/// and metadata.
pub fn equals_datatype(&self, other: &DataType) -> bool {
```
Maybe we should use this instead of plain comparison ?
### To Reproduce
_No response_
### Expected behavior
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]