[ 
https://issues.apache.org/jira/browse/SPARK-57626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved SPARK-57626.
------------------------------
    Fix Version/s: 5.0.0
       Resolution: Fixed

Issue resolved by pull request 56685
[https://github.com/apache/spark/pull/56685]

> Extend shared get_json_object parsing to nested named paths
> -----------------------------------------------------------
>
>                 Key: SPARK-57626
>                 URL: https://issues.apache.org/jira/browse/SPARK-57626
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.3.0
>            Reporter: Chao Sun
>            Assignee: Chao Sun
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 5.0.0
>
>
> SPARK-47670 introduced opt-in shared parsing for repeated get_json_object 
> calls over the same JSON input. Its implementation intentionally shares only 
> simple top-level fields, so repeated literal nested object paths still parse 
> the input independently.
> For example:
> {code:sql}
> SELECT
>   get_json_object(json, '$.payload.user.id') AS user_id,
>   get_json_object(json, '$.payload.user.name') AS user_name,
>   get_json_object(json, '$.payload.request_id') AS request_id
> FROM events
> {code}
> With spark.sql.optimizer.getJsonObjectSharedParsing.enabled=true, these 
> prefix-free named paths should be extracted in one streaming scan without 
> requiring any query changes.
> The follow-up should preserve the existing get_json_object behavior for 
> malformed input, duplicate keys, nulls, and rendering failures. An ancestor 
> and its descendant must not share the same parse, because each requested path 
> needs independent legacy semantics. Dynamic paths, wildcards, array 
> subscripts, and excessively deep paths should continue using the existing 
> evaluation.
> This is distinct from SPARK-53764, which collapses nested get_json_object 
> function calls rather than sharing sibling paths over the same input.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to