Xiao Li created SPARK-20273: ------------------------------- Summary: No non-deterministic Filter push-down into Join Conditions Key: SPARK-20273 URL: https://issues.apache.org/jira/browse/SPARK-20273 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.1.0 Reporter: Xiao Li Assignee: Xiao Li
{noformat} sql("SELECT t1.b, rand(0) as r FROM cachedData, cachedData t1 GROUP BY t1.b having r > 0.5").show() {noformat} We will get the following error: {noformat} Job aborted due to stage failure: Task 1 in stage 4.0 failed 1 times, most recent failure: Lost task 1.0 in stage 4.0 (TID 8, localhost, executor driver): java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.eval(Unknown Source) at org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$1.apply(BroadcastNestedLoopJoinExec.scala:87) at org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$1.apply(BroadcastNestedLoopJoinExec.scala:87) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463) {noformat} Filters could be pushed down to the join conditions by the optimizer rule {{PushPredicateThroughJoin}}. However, we block users to add non-deterministics conditions by the analyzer (For details, see the PR https://github.com/apache/spark/pull/7535). We should not push down non-deterministic conditions; otherwise, we should allow users to do it by explicitly initialize the non-deterministic expressions -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org