[jira] [Assigned] (SPARK-45171) GenerateExec fails to initialize non-deterministic expressions before use

Hyukjin Kwon (Jira) Thu, 14 Sep 2023 21:24:38 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-45171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hyukjin Kwon reassigned SPARK-45171:
------------------------------------

    Assignee: Bruce Robbins

> GenerateExec fails to initialize non-deterministic expressions before use
> -------------------------------------------------------------------------
>
>                 Key: SPARK-45171
>                 URL: https://issues.apache.org/jira/browse/SPARK-45171
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.5.0
>            Reporter: Bruce Robbins
>            Assignee: Bruce Robbins
>            Priority: Major
>              Labels: pull-request-available
>
> The following query fails:
> {noformat}
> select *
> from explode(
>   transform(sequence(0, cast(rand()*1000 as int) + 1), x -> x * 22)
> );
> {noformat}
> The error is:
> {noformat}
> 23/09/14 09:27:25 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
> java.lang.IllegalArgumentException: requirement failed: Nondeterministic 
> expression org.apache.spark.sql.catalyst.expressions.Rand should be 
> initialized before eval.
>       at scala.Predef$.require(Predef.scala:281)
>       at 
> org.apache.spark.sql.catalyst.expressions.Nondeterministic.eval(Expression.scala:497)
>       at 
> org.apache.spark.sql.catalyst.expressions.Nondeterministic.eval$(Expression.scala:495)
>       at 
> org.apache.spark.sql.catalyst.expressions.RDG.eval(randomExpressions.scala:35)
>       at 
> org.apache.spark.sql.catalyst.expressions.BinaryArithmetic.eval(arithmetic.scala:384)
>       at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:543)
>       at 
> org.apache.spark.sql.catalyst.expressions.BinaryArithmetic.eval(arithmetic.scala:384)
>       at 
> org.apache.spark.sql.catalyst.expressions.Sequence.eval(collectionOperations.scala:3062)
>       at 
> org.apache.spark.sql.catalyst.expressions.SimpleHigherOrderFunction.eval(higherOrderFunctions.scala:275)
>       at 
> org.apache.spark.sql.catalyst.expressions.SimpleHigherOrderFunction.eval$(higherOrderFunctions.scala:274)
>       at 
> org.apache.spark.sql.catalyst.expressions.ArrayTransform.eval(higherOrderFunctions.scala:308)
>       at 
> org.apache.spark.sql.catalyst.expressions.ExplodeBase.eval(generators.scala:375)
>       at 
> org.apache.spark.sql.execution.GenerateExec.$anonfun$doExecute$8(GenerateExec.scala:108)
> ...        
> {noformat}
> However, this query succeeds:
> {noformat}
> select *
> from explode(
>   sequence(0, cast(rand()*1000 as int) + 1)
> );
> {noformat}
> The difference is that {{transform}} turns off whole-stage codegen, which 
> exposes a bug in {{GenerateExec}} where the non-deterministic expression 
> passed to the generator function is not initialized before being used.
> An even simpler reprod case is:
> {noformat}
> set spark.sql.codegen.wholeStage=false;
> select explode(array(rand()));
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Assigned] (SPARK-45171) GenerateExec fails to initialize non-deterministic expressions before use

Reply via email to