Yuming Wang created SPARK-33368:
-----------------------------------

             Summary: SimplifyConditionals simplifies non-deterministic 
expressions
                 Key: SPARK-33368
                 URL: https://issues.apache.org/jira/browse/SPARK-33368
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.0.1, 2.4.7, 3.1.0
            Reporter: Yuming Wang


It seems we simplified non-deterministic expressions with aliases. for example:
{code:sql}
CREATE TABLE t(a int, b int, c int) using parquet
{code}

{code:sql}
sql
SELECT CASE 
 WHEN rand(100) > 1 THEN 1 
 WHEN rand(100) + 1 > 1000 THEN 1 
 WHEN rand(100) + 2 < 100 THEN 1 
 ELSE 1 
END AS x 
FROM t 
{code}

The plan is:
{noformat}
== Physical Plan ==
*(1) Project [CASE WHEN (rand(100) > 1.0) THEN 1 WHEN ((rand(100) + 1.0) > 
1000.0) THEN 1 WHEN ((rand(100) + 2.0) < 100.0) THEN 1 ELSE 1 END AS x#6]
+- *(1) ColumnarToRow
 +- FileScan parquet default.t[] Batched: true, DataFilters: [], Format: 
Parquet, Location: 
InMemoryFileIndex[file:/Users/yumwang/opensource/spark/sql/core/spark-warehouse/org.apache.spark....,
 PartitionFilters: [], PushedFilters: [], ReadSchema: struct<>
{noformat}


{code:sql}
SELECT CASE 
 WHEN rd > 1 THEN 1 
 WHEN rd + 1 > 1000 THEN 1 
 WHEN rd + 2 < 100 THEN 1 
 ELSE 1 
END AS x 
FROM (SELECT *, rand(100) as rd FROM t) t1 
{code}

The plan is:
{noformat}
== Physical Plan ==
*(1) Project [1 AS x#1]
+- *(1) ColumnarToRow
 +- FileScan parquet default.t[] Batched: true, DataFilters: [], Format: 
Parquet, Location: 
InMemoryFileIndex[file:/Users/yumwang/opensource/spark/sql/core/spark-warehouse/org.apache.spark....,
 PartitionFilters: [], PushedFilters: [], ReadSchema: struct<>
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to