jiahong.li created SPARK-38333:
----------------------------------

             Summary: DPP cause DataSourceScanExec 
java.lang.NullPointerException
                 Key: SPARK-38333
                 URL: https://issues.apache.org/jira/browse/SPARK-38333
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.1.2
            Reporter: jiahong.li


In DPP,we trigger NPE,like blow:

Caused by: java.lang.NullPointerException
    at 
org.apache.spark.sql.execution.DataSourceScanExec.$init$(DataSourceScanExec.scala:57)
    at 
org.apache.spark.sql.execution.FileSourceScanExec.<init>(DataSourceScanExec.scala:172)

...

    at 
org.apache.spark.sql.catalyst.expressions.CodeGeneratorWithInterpretedFallback.createObject(CodeGeneratorWithInterpretedFallback.scala:56)
    at 
org.apache.spark.sql.catalyst.expressions.Predicate$.create(predicates.scala:101)
    at 
org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$2(basicPhysicalOperators.scala:246)
    at 
org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$2$adapted(basicPhysicalOperators.scala:245)
    at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2(RDD.scala:885)

,the root cause is addExprTree funtion in EquivalentExpressions:

```

def addExprTree(
expr: Expression,
addFunc: Expression => Boolean = addExpr): Unit = {
val skip = expr.isInstanceOf[LeafExpression] ||
// `LambdaVariable` is usually used as a loop variable, which can't be 
evaluated ahead of the
// loop. So we can't evaluate sub-expressions containing `LambdaVariable` at 
the beginning.
expr.find(_.isInstanceOf[LambdaVariable]).isDefined ||
// `PlanExpression` wraps query plan. To compare query plans of 
`PlanExpression` on executor,
// can cause error like NPE.
(expr.isInstanceOf[PlanExpression[_]] && TaskContext.get != null)

if (!skip && !addFunc(expr)) {
childrenToRecurse(expr).foreach(addExprTree(_, addFunc))
commonChildrenToRecurse(expr).filter(_.nonEmpty).foreach(addCommonExprs(_, 
addFunc))

```

maybe we should change it like this :
```

(expr.find(_.isInstanceOf[PlanExpression[_]]).isDefined && TaskContext.get != 
null)

```

because, in DPP,the filter expression like this:

DynamicPruningExpression(InSubqueryExec(value, broadcastValues, exprId)

so, we should iterator children, if PlanExpression found, such as  
InSubqueryExec, we should skip addExprTree, then NPE will not appears



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to