[
https://issues.apache.org/jira/browse/SPARK-26639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17511358#comment-17511358
]
Stu edited comment on SPARK-26639 at 3/24/22, 10:13 PM:
--------------------------------------------------------
Here's another example of this happening, in Spark 3.1.2. I'm running the
following code:
{code:java}
WITH t AS (
SELECT random() as a
)
SELECT * FROM t
UNION
SELECT * FROM t {code}
The CTE has a non-deterministic function. If it was pre-calculated, the same
random value would be chosen for `a` in both unioned queries, and the output
would be deduplicated into a single record.
This is not the case. The output is two records, with different random values.
In our platform, some folks like to write complex CTEs and reference them
multiple times. Recalculating these for every reference is quite
computationally expensive, so we recommend to create separate tables in these
cases, but don't have any way to enforce this. Fixing this bug would save a
good number of compute hours!
was (Author: stubartmess):
Here's another example of this happening, in Spark 3.1.2. I'm running the
following code:
{code:java}
WITH t AS (
SELECT random() as a
)
SELECT * FROM t
UNION
SELECT * FROM t {code}
The CTE has a non-deterministic function. If it was pre-calculated, the same
random value would be chosen for `a` in both unioned queries, and the output
would be deduplicated into a single record.
This is not the case. The output is two records, with different random values.
> The reuse subquery function maybe does not work in SPARK SQL
> ------------------------------------------------------------
>
> Key: SPARK-26639
> URL: https://issues.apache.org/jira/browse/SPARK-26639
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.1.0
> Reporter: Ke Jia
> Priority: Major
>
> The subquery reuse feature has done in
> [https://github.com/apache/spark/pull/14548]
> In my test, I found the visualized plan do show the subquery is executed
> once. But the stage of same subquery execute maybe not once.
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]