[ https://issues.apache.org/jira/browse/SPARK-40963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bruce Robbins updated SPARK-40963: ---------------------------------- Description: Example: {noformat} select c1, explode(c4) as c5 from ( select c1, array(c3) as c4 from ( select c1, explode_outer(c2) as c3 from values (1, array(1, 2)), (2, array(2, 3)), (3, null) as data(c1, c2) ) ); +---+---+ |c1 |c5 | +---+---+ |1 |1 | |1 |2 | |2 |2 | |2 |3 | |3 |0 | +---+---+ {noformat} In the last row, {{c5}} is 0, but should be {{NULL}}. At the time {{CreateArray(c3)}} is instantiated, c3's nullability is incorrect because the new projection created by {{ExtractGenerator}} uses {{generatorOutput}} from {{explode_outer(c2)}} as a projection list, but {{generatorOutput}} doesn't take into account that {{explode_outer(c2)}} is an _outer_ explode, so the nullability setting is lost. Another example: {noformat} select c1, inline_outer(c4) from ( select c1, array(c3) as c4 from ( select c1, explode_outer(c2) as c3 from values (1, array(named_struct('a', 1, 'b', 2))), (2, array(named_struct('a', 3, 'b', 4), named_struct('a', 5, 'b', 6))), (3, null) as data(c1, c2) ) ); 22/10/27 11:53:20 ERROR Executor: Exception in task 1.0 in stage 1.0 (TID 2) java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.generate_doConsume_1$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.generate_doConsume_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364) {noformat} was: Example: {noformat} select c1, explode(c4) as c5 from ( select c1, array(c3) as c4 from ( select c1, explode_outer(c2) as c3 from values (1, array(1, 2)), (2, array(2, 3)), (3, null) as data(c1, c2) ) ); +---+---+ |c1 |c5 | +---+---+ |1 |1 | |1 |2 | |2 |2 | |2 |3 | |3 |0 | +---+---+ {noformat} In the last row, {{c5}} is 0, but should be {{NULL}}. At the time {{CreateArray(c3)}} is instantiated, c3's nullability is incorrect because the new projection created by {{ExtractGenerator}} uses {{generatorOutput}} from {{explode_outer(c2)}} as a projection list, but {{generatorOutput}} doesn't take into account that {{explode_outer(c2)}} is an _outer_ explode, so the nullability setting is lost. Another example: {noformat} select c1, inline_outer(c4) from ( select c1, array(c3) as c4 from ( select c1, explode_outer(c2) as c3 from values (1, array(named_struct('a', 1, 'b', 2))), (2, array(named_struct('a', 3, 'b', 4), named_struct('a', 5, 'b', 6))), (3, null) as data(c1, c2) ) ); 22/10/27 11:53:20 ERROR Executor: Exception in task 1.0 in stage 1.0 (TID 2) java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.generate_doConsume_1$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.generate_doConsume_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364) - - - {noformat} > ExtractGenerator sets incorrect nullability in new Project > ---------------------------------------------------------- > > Key: SPARK-40963 > URL: https://issues.apache.org/jira/browse/SPARK-40963 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.1.3, 3.2.2, 3.4.0, 3.3.1 > Reporter: Bruce Robbins > Priority: Major > Labels: correctness > > Example: > {noformat} > select c1, explode(c4) as c5 from ( > select c1, array(c3) as c4 from ( > select c1, explode_outer(c2) as c3 > from values > (1, array(1, 2)), > (2, array(2, 3)), > (3, null) > as data(c1, c2) > ) > ); > +---+---+ > |c1 |c5 | > +---+---+ > |1 |1 | > |1 |2 | > |2 |2 | > |2 |3 | > |3 |0 | > +---+---+ > {noformat} > In the last row, {{c5}} is 0, but should be {{NULL}}. > At the time {{CreateArray(c3)}} is instantiated, c3's nullability is > incorrect because the new projection created by {{ExtractGenerator}} uses > {{generatorOutput}} from {{explode_outer(c2)}} as a projection list, but > {{generatorOutput}} doesn't take into account that {{explode_outer(c2)}} is > an _outer_ explode, so the nullability setting is lost. > Another example: > {noformat} > select c1, inline_outer(c4) from ( > select c1, array(c3) as c4 from ( > select c1, explode_outer(c2) as c3 > from values > (1, array(named_struct('a', 1, 'b', 2))), > (2, array(named_struct('a', 3, 'b', 4), named_struct('a', 5, 'b', 6))), > (3, null) > as data(c1, c2) > ) > ); > 22/10/27 11:53:20 ERROR Executor: Exception in task 1.0 in stage 1.0 (TID 2) > java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.generate_doConsume_1$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.generate_doConsume_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org