Chao Sun created SPARK-57626:
--------------------------------

             Summary: Extend shared get_json_object parsing to nested named 
paths
                 Key: SPARK-57626
                 URL: https://issues.apache.org/jira/browse/SPARK-57626
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 4.3.0
            Reporter: Chao Sun
            Assignee: Chao Sun


SPARK-47670 introduced opt-in shared parsing for repeated get_json_object calls 
over the same JSON input. Its implementation intentionally shares only simple 
top-level fields, so repeated literal nested object paths still parse the input 
independently.

For example:

{code:sql}
SELECT
  get_json_object(json, '$.payload.user.id') AS user_id,
  get_json_object(json, '$.payload.user.name') AS user_name,
  get_json_object(json, '$.payload.request_id') AS request_id
FROM events
{code}

With spark.sql.optimizer.getJsonObjectSharedParsing.enabled=true, these 
prefix-free named paths should be extracted in one streaming scan without 
requiring any query changes.

The follow-up should preserve the existing get_json_object behavior for 
malformed input, duplicate keys, nulls, and rendering failures. An ancestor and 
its descendant must not share the same parse, because each requested path needs 
independent legacy semantics. Dynamic paths, wildcards, array subscripts, and 
excessively deep paths should continue using the existing evaluation.

This is distinct from SPARK-53764, which collapses nested get_json_object 
function calls rather than sharing sibling paths over the same input.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to