Nikhil Sheoran created SPARK-49743:
--------------------------------------

             Summary: OptimizeCsvJsonExpr should not change the schema of 
underlying StructType in GetArrayStructFields
                 Key: SPARK-49743
                 URL: https://issues.apache.org/jira/browse/SPARK-49743
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.5.2
            Reporter: Nikhil Sheoran


The `OptimizeCsvJsonExprs` rule can potentially change the schema of the 
underlying `StructField` if there are differences in the field used to access 
the struct vs the field in the underlying struct.

This surfaces as a correctness issue where instead of picking the values for 
the corresponding column we end up returning NULL.

 

A simple example query is:

```
SELECT
  from_json('[\{"a": '||id||', "b": '|| (2*id) ||'}]', 'array<struct<a: INT, b: 
INT>>').a,
  from_json('[\{"a": '||id||', "b": '|| (2*id) ||'}]', 'array<struct<a: INT, b: 
INT>>').A
FROM
  range(3) as t
```

Here, the result is `[0], [1], [2]` for `a` but `[null], [null], [null]` for 
`A`. Since struct field accessor is case-insensitive, the result should had 
been `[0], [1], [2]` for both.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to