jiahong.li created SPARK-38333:
----------------------------------
Summary: DPP cause DataSourceScanExec
java.lang.NullPointerException
Key: SPARK-38333
URL: https://issues.apache.org/jira/browse/SPARK-38333
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.1.2
Reporter: jiahong.li
In DPP,we trigger NPE,like blow:
Caused by: java.lang.NullPointerException
at
org.apache.spark.sql.execution.DataSourceScanExec.$init$(DataSourceScanExec.scala:57)
at
org.apache.spark.sql.execution.FileSourceScanExec.<init>(DataSourceScanExec.scala:172)
...
at
org.apache.spark.sql.catalyst.expressions.CodeGeneratorWithInterpretedFallback.createObject(CodeGeneratorWithInterpretedFallback.scala:56)
at
org.apache.spark.sql.catalyst.expressions.Predicate$.create(predicates.scala:101)
at
org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$2(basicPhysicalOperators.scala:246)
at
org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$2$adapted(basicPhysicalOperators.scala:245)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2(RDD.scala:885)
,the root cause is addExprTree funtion in EquivalentExpressions:
```
def addExprTree(
expr: Expression,
addFunc: Expression => Boolean = addExpr): Unit = {
val skip = expr.isInstanceOf[LeafExpression] ||
// `LambdaVariable` is usually used as a loop variable, which can't be
evaluated ahead of the
// loop. So we can't evaluate sub-expressions containing `LambdaVariable` at
the beginning.
expr.find(_.isInstanceOf[LambdaVariable]).isDefined ||
// `PlanExpression` wraps query plan. To compare query plans of
`PlanExpression` on executor,
// can cause error like NPE.
(expr.isInstanceOf[PlanExpression[_]] && TaskContext.get != null)
if (!skip && !addFunc(expr)) {
childrenToRecurse(expr).foreach(addExprTree(_, addFunc))
commonChildrenToRecurse(expr).filter(_.nonEmpty).foreach(addCommonExprs(_,
addFunc))
```
maybe we should change it like this :
```
(expr.find(_.isInstanceOf[PlanExpression[_]]).isDefined && TaskContext.get !=
null)
```
because, in DPP,the filter expression like this:
DynamicPruningExpression(InSubqueryExec(value, broadcastValues, exprId)
so, we should iterator children, if PlanExpression found, such as
InSubqueryExec, we should skip addExprTree, then NPE will not appears
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]