adragomir opened a new issue, #19943:
URL: https://github.com/apache/datafusion/issues/19943

   ### Describe the bug
   
   When using deeply nested schemas, for example  (duckdb format)
   ```
   CREATE OR REPLACE TABLE raw (
     timestamp TIMESTAMP_S,  -- 0
     web STRUCT(
       webPageDetails STRUCT(
         name VARCHAR, 
         pageViews STRUCT(value INT8)
       )
     ), 
     identityMap MAP(
       VARCHAR, 
       STRUCT(
         id VARCHAR, 
         prim BOOLEAN
       )[]
     )
   );
   ```
   and doing the query 
   ```
       SELECT
           identityMap['ECID'][1]['id']
       FROM
           raw
   ```
   
   The `datafusion_optimizer::analyzer::type_coercion` step transforms the plan 
from 
   ```
       Projection: get_field(array_element(get_field(raw.identityMap, 
Utf8("ECID")), Int64(1)), Utf8("id"))
         TableScan: raw
   ```
   to 
   ```
       Projection: get_field(array_element(CAST(get_field(raw.identityMap, 
Utf8("ECID")) AS List(nullable Struct("id": nullable Utf8, "prim": nullable 
Boolean))), Int64(1)), Utf8("id"))
         TableScan: raw
   ```
   The reason is that inside the guts of 
`coerce_arguments_for_signature_with_scalar_udf` & friends, the type 
comparisons are done using equality. But for arrays, there are functions that 
return the same types, the only difference being the missing field name 
("element", in the case of parquet lists). So we add a cast, even though the 
schemas are identical. 
   
   Arrow has ```
       /// Compares the datatype with another, ignoring nested field names
       /// and metadata.
       pub fn equals_datatype(&self, other: &DataType) -> bool {
   ```
   Maybe we should use this instead of plain comparison ? 
   
   
   
   ### To Reproduce
   
   _No response_
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to