Ziqi Liu created SPARK-49460:
--------------------------------
Summary: NPE error in EmptyRelationExec.cleanupResources()
Key: SPARK-49460
URL: https://issues.apache.org/jira/browse/SPARK-49460
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.5.2, 4.0.0, 3.5.3
Reporter: Ziqi Liu
This bug was introduced in [https://github.com/apache/spark/pull/46830] where
it wrongly assumes {{cleanupResources}} will not be called on the executor.
A simple repro
```
spark.sql("create table t1left (a int, b int);")
spark.sql("insert into t1left values (1, 1), (2,2), (3,3);")
spark.sql("create table t1right (a int, b int);")
spark.sql("create table t1empty (a int, b int);")
spark.sql("insert into t1right values (2,20), (4, 40);")
spark.sql("""
|with leftT as (
| with erp as (
| select
| *
| from
| t1left
| join t1empty on t1left.a = t1empty.a
| join t1right on t1left.a = t1right.a
| )
| SELECT
| CASE
| WHEN COUNT(*) = 0 THEN 4
| ELSE NULL
| END AS a
| FROM
| erp
| HAVING
| COUNT(*) = 0
|)
|select
| /*+ MERGEJOIN(t1right) */
| *
|from
| leftT
| join t1right on leftT.a = t1right.a""").collect()
```
error stacktrace:
```
24/08/29 17:52:08 ERROR TaskSetManager: Task 0 in stage 9.0 failed 1 times;
aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
stage 9.0 failed 1 times, most recent failure: Lost task 0.0 in stage 9.0 (TID
10) (192.168.3.181 executor driver): java.lang.NullPointerException: Cannot
invoke
"org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.foreach(scala.Function1)"
because the return value of
"org.apache.spark.sql.execution.EmptyRelationExec.logical()" is null
at
org.apache.spark.sql.execution.EmptyRelationExec.cleanupResources(EmptyRelationExec.scala:86)
at
org.apache.spark.sql.execution.SparkPlan.$anonfun$cleanupResources$1(SparkPlan.scala:571)
at
org.apache.spark.sql.execution.SparkPlan.$anonfun$cleanupResources$1$adapted(SparkPlan.scala:571)
at scala.collection.immutable.Vector.foreach(Vector.scala:2124)
....
at
org.apache.spark.sql.execution.SparkPlan.cleanupResources(SparkPlan.scala:571)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.processNext(Unknown
Source)
at
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at
org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50)
at
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:901)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:901)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:374)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:338)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171)
at org.apache.spark.scheduler.Task.run(Task.scala:146)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:644)
```
```
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]