[jira] [Created] (SPARK-45171) GenerateExec fails to initialize non-deterministic expressions before use

Bruce Robbins (Jira) Thu, 14 Sep 2023 09:38:04 -0700

Bruce Robbins created SPARK-45171:
-------------------------------------

             Summary: GenerateExec fails to initialize non-deterministic 
expressions before use
                 Key: SPARK-45171
                 URL: https://issues.apache.org/jira/browse/SPARK-45171
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.5.0
            Reporter: Bruce Robbins



The following query fails:
{noformat}
select *
from explode(
  transform(sequence(0, cast(rand()*1000 as int) + 1), x -> x * 22)
);
{noformat}
The error is:
{noformat}
23/09/14 09:27:25 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
java.lang.IllegalArgumentException: requirement failed: Nondeterministic 
expression org.apache.spark.sql.catalyst.expressions.Rand should be initialized 
before eval.
        at scala.Predef$.require(Predef.scala:281)
        at 
org.apache.spark.sql.catalyst.expressions.Nondeterministic.eval(Expression.scala:497)
        at 
org.apache.spark.sql.catalyst.expressions.Nondeterministic.eval$(Expression.scala:495)
        at 
org.apache.spark.sql.catalyst.expressions.RDG.eval(randomExpressions.scala:35)
        at 
org.apache.spark.sql.catalyst.expressions.BinaryArithmetic.eval(arithmetic.scala:384)
        at 
org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:543)
        at 
org.apache.spark.sql.catalyst.expressions.BinaryArithmetic.eval(arithmetic.scala:384)
        at 
org.apache.spark.sql.catalyst.expressions.Sequence.eval(collectionOperations.scala:3062)
        at 
org.apache.spark.sql.catalyst.expressions.SimpleHigherOrderFunction.eval(higherOrderFunctions.scala:275)
        at 
org.apache.spark.sql.catalyst.expressions.SimpleHigherOrderFunction.eval$(higherOrderFunctions.scala:274)
        at 
org.apache.spark.sql.catalyst.expressions.ArrayTransform.eval(higherOrderFunctions.scala:308)
        at 
org.apache.spark.sql.catalyst.expressions.ExplodeBase.eval(generators.scala:375)
        at 
org.apache.spark.sql.execution.GenerateExec.$anonfun$doExecute$8(GenerateExec.scala:108)
...        
{noformat}
However, this query succeeds:
{noformat}
select *
from explode(
  sequence(0, cast(rand()*1000 as int) + 1)
);
{noformat}
The difference is that {{transform}} turns off whole-stage codegen, which 
exposes a bug in {{GenerateExec}} where the non-deterministic expression passed 
to the generator function is not initialized before being used.

An even simpler reprod case is:
{noformat}
set spark.sql.codegen.wholeStage=false;

select explode(array(rand()));
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45171) GenerateExec fails to initialize non-deterministic expressions before use

Reply via email to