Nikhil Sheoran created SPARK-49743:
--------------------------------------
Summary: OptimizeCsvJsonExpr should not change the schema of
underlying StructType in GetArrayStructFields
Key: SPARK-49743
URL: https://issues.apache.org/jira/browse/SPARK-49743
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.5.2
Reporter: Nikhil Sheoran
The `OptimizeCsvJsonExprs` rule can potentially change the schema of the
underlying `StructField` if there are differences in the field used to access
the struct vs the field in the underlying struct.
This surfaces as a correctness issue where instead of picking the values for
the corresponding column we end up returning NULL.
A simple example query is:
```
SELECT
from_json('[\{"a": '||id||', "b": '|| (2*id) ||'}]', 'array<struct<a: INT, b:
INT>>').a,
from_json('[\{"a": '||id||', "b": '|| (2*id) ||'}]', 'array<struct<a: INT, b:
INT>>').A
FROM
range(3) as t
```
Here, the result is `[0], [1], [2]` for `a` but `[null], [null], [null]` for
`A`. Since struct field accessor is case-insensitive, the result should had
been `[0], [1], [2]` for both.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]