[
https://issues.apache.org/jira/browse/SPARK-42753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-42753:
------------------------------------
Assignee: (was: Apache Spark)
> ReusedExchange refers to non-existent node
> ------------------------------------------
>
> Key: SPARK-42753
> URL: https://issues.apache.org/jira/browse/SPARK-42753
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, Web UI
> Affects Versions: 3.4.0
> Reporter: Steven Chen
> Priority: Major
>
> There is an AQE “issue“ where during AQE planning, the Exchange "that's
> being" reused could be replaced in the plan tree. So, when we print the query
> plan, the ReusedExchange will refer to an “unknown“ Exchange. An example
> below:
>
> {code:java}
> (2775) ReusedExchange [Reuses operator id: unknown]
> Output [3]: [sr_customer_sk#271, sr_store_sk#275, sum#377L]{code}
>
>
> Below is an example to demonstrate the root cause:
>
> {code:java}
> AdaptiveSparkPlan
> |-- SomeNode X (subquery xxx)
> |-- Exchange A
> |-- SomeNode Y
> |-- Exchange B
> Subquery:Hosting operator = SomeNode Hosting Expression = xxx
> dynamicpruning#388
> AdaptiveSparkPlan
> |-- SomeNode M
> |-- Exchange C
> |-- SomeNode N
> |-- Exchange D
> {code}
>
>
> Step 1: Exchange B is materialized and the QueryStage is added to stage cache
> Step 2: Exchange D reuses Exchange B
> Step 3: Exchange C is materialized and the QueryStage is added to stage cache
> Step 4: Exchange A reuses Exchange C
>
> Then the final plan looks like:
>
> {code:java}
> AdaptiveSparkPlan
> |-- SomeNode X (subquery xxx)
> |-- Exchange A -> ReusedExchange (reuses Exchange C)
> Subquery:Hosting operator = SomeNode Hosting Expression = xxx
> dynamicpruning#388
> AdaptiveSparkPlan
> |-- SomeNode M
> |-- Exchange C -> PhotonShuffleMapStage ....
> |-- SomeNode N
> |-- Exchange D -> ReusedExchange (reuses Exchange B)
> {code}
>
>
> As a result, the ReusedExchange (reuses Exchange B) will refer to a non-exist
> node. This *DOES NOT* affect query execution but will cause the query
> visualization malfunction in the following ways:
> # The ReusedExchange child subtree will still appear in the Spark UI graph
> but will contain no node IDs.
> # The ReusedExchange node details in the Explain plan will refer to a
> UNKNOWN node. Example below.
> {code:java}
> (2775) ReusedExchange [Reuses operator id: unknown]{code}
> # The child exchange and its subtree may be missing from the Explain text
> completely. No node details or tree string shown.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]