[ 
https://issues.apache.org/jira/browse/SPARK-20273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-20273:
----------------------------
    Summary: Disallow Non-deterministic Filter push-down into Join Conditions  
(was: No non-deterministic Filter push-down into Join Conditions)

> Disallow Non-deterministic Filter push-down into Join Conditions
> ----------------------------------------------------------------
>
>                 Key: SPARK-20273
>                 URL: https://issues.apache.org/jira/browse/SPARK-20273
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Xiao Li
>            Assignee: Xiao Li
>
> {noformat}
> sql("SELECT t1.b, rand(0) as r FROM cachedData, cachedData t1 GROUP BY t1.b 
> having r > 0.5").show()
> {noformat}
> We will get the following error:
> {noformat}
> Job aborted due to stage failure: Task 1 in stage 4.0 failed 1 times, most 
> recent failure: Lost task 1.0 in stage 4.0 (TID 8, localhost, executor 
> driver): java.lang.NullPointerException
>       at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.eval(Unknown
>  Source)
>       at 
> org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$1.apply(BroadcastNestedLoopJoinExec.scala:87)
>       at 
> org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$1.apply(BroadcastNestedLoopJoinExec.scala:87)
>       at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463)
> {noformat}
> Filters could be pushed down to the join conditions by the optimizer rule 
> {{PushPredicateThroughJoin}}. However, we block users to add 
> non-deterministics conditions by the analyzer (For details, see the PR 
> https://github.com/apache/spark/pull/7535). 
> We should not push down non-deterministic conditions; otherwise, we should 
> allow users to do it by explicitly initialize the non-deterministic 
> expressions



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to