[ https://issues.apache.org/jira/browse/SPARK-32928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17199646#comment-17199646 ]
Tanel Kiis commented on SPARK-32928: ------------------------------------ One more point, where this can manifest is FilterExec reordering isNotNull predicates {code:title=Test SQL file} -- Test window operator with codegen on and off. --CONFIG_DIM1 spark.sql.codegen.wholeStage=true --CONFIG_DIM1 spark.sql.codegen.wholeStage=false,spark.sql.codegen.factoryMode=CODEGEN_ONLY --CONFIG_DIM1 spark.sql.codegen.wholeStage=false,spark.sql.codegen.factoryMode=NO_CODEGEN CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 AS testData(a); SELECT a FROM testData WHERE NOT ISNULL(IF(RAND(0) > 0.5, NULL, a)) AND RAND(1) > 0.5; {code} {code:title=Generated output file} - Automatically generated by SQLQueryTestSuite -- Number of queries: 2 -- !query CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 AS testData(a) -- !query schema struct<> -- !query output -- !query SELECT a FROM testData WHERE NOT ISNULL(IF(RAND(0) > 0.5, NULL, a)) AND RAND(1) > 0.5 -- !query schema struct<a:int> -- !query output 3 4 8 {code} {code:title=Error on running the test} 23:16:44.013 ERROR org.apache.spark.sql.SQLQueryTestSuite: Error using configs: spark.sql.codegen.wholeStage=false,spark.sql.codegen.factoryMode=CODEGEN_ONLY [info] - deterministic.sql *** FAILED *** (1 second, 955 milliseconds) [info] deterministic.sql [info] Expected "3 [info] 4 [info] 8[]", but got "3 [info] 4 [info] 8[ [info] 9]" Result did not match for query #1 {code} > Non-deterministic expressions should not be reordered inside AND and OR > ----------------------------------------------------------------------- > > Key: SPARK-32928 > URL: https://issues.apache.org/jira/browse/SPARK-32928 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.1.0 > Reporter: Tanel Kiis > Priority: Major > > Using the splitDisjunctivePredicates and splitConjunctivePredicates helper > methods can change the number of times a non-deterministic expression is > executed. This can cause correctness issues on the client side. > An existing test in the FilterPushdownSuite seems to exhibit this problem > {code} > test("generate: non-deterministic predicate referenced no generated column") { > val originalQuery = { > testRelationWithArrayType > .generate(Explode('c_arr), alias = Some("arr")) > .where(('b >= 5) && ('a + Rand(10).as("rnd") > 6) && ('col > 6)) > } > val optimized = Optimize.execute(originalQuery.analyze) > val correctAnswer = { > testRelationWithArrayType > .where('b >= 5) > .generate(Explode('c_arr), alias = Some("arr")) > .where('a + Rand(10).as("rnd") > 6 && 'col > 6) > .analyze > } > comparePlans(optimized, correctAnswer) > } > {code} > In the optimized plan, the deterministic filter is moved ahead of the > non-deterministic one: > {code} > Filter ((6 < none#0) AND (cast(6 as double) < (rand(10) + cast(none#0 as > double)))) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org