Zheng Shao created SPARK-47670:
----------------------------------
Summary: Multiple calls to GET_JSON_OBJECT with the same JSON str
should parse it just one time
Key: SPARK-47670
URL: https://issues.apache.org/jira/browse/SPARK-47670
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 3.5.2
Reporter: Zheng Shao
For a query like the following:
{{SELECT}}
{{ GET_JSON_OBJECT(json_col, '$.a.b'),}}
{{ GET_JSON_OBJECT(json_col, '$.a.c')}}
{{FROM t}}
SparkSQL would generate a plan that parse the json_col twice.
Ideally, SparkSQL should only parse the `json_col` once. The optimizer should
find out the common JSON parsing, and modify the plan to parse the JSON once,
get the result out, and flatten it back.
An alternative way to support this is the ":" notation (JSON Path) as in other
systems where the query optimizer will automatically share a single JSON
parsing.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]