Xiao Li created SPARK-20273:
-------------------------------

             Summary: No non-deterministic Filter push-down into Join Conditions
                 Key: SPARK-20273
                 URL: https://issues.apache.org/jira/browse/SPARK-20273
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.1.0
            Reporter: Xiao Li
            Assignee: Xiao Li


{noformat}
sql("SELECT t1.b, rand(0) as r FROM cachedData, cachedData t1 GROUP BY t1.b 
having r > 0.5").show()
{noformat}

We will get the following error:
{noformat}
Job aborted due to stage failure: Task 1 in stage 4.0 failed 1 times, most 
recent failure: Lost task 1.0 in stage 4.0 (TID 8, localhost, executor driver): 
java.lang.NullPointerException
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.eval(Unknown
 Source)
        at 
org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$1.apply(BroadcastNestedLoopJoinExec.scala:87)
        at 
org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$1.apply(BroadcastNestedLoopJoinExec.scala:87)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463)
{noformat}

Filters could be pushed down to the join conditions by the optimizer rule 
{{PushPredicateThroughJoin}}. However, we block users to add non-deterministics 
conditions by the analyzer (For details, see the PR 
https://github.com/apache/spark/pull/7535). 

We should not push down non-deterministic conditions; otherwise, we should 
allow users to do it by explicitly initialize the non-deterministic expressions




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to