[jira] [Created] (SPARK-20008) hiveContext.emptyDataFrame.except(hiveContext.emptyDataFrame).count() returns 1

Ravindra Bajpai (JIRA) Fri, 17 Mar 2017 18:48:07 -0700

Ravindra Bajpai created SPARK-20008:
---------------------------------------


             Summary: 
hiveContext.emptyDataFrame.except(hiveContext.emptyDataFrame).count() returns 1
                 Key: SPARK-20008
                 URL: https://issues.apache.org/jira/browse/SPARK-20008
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.0.2
            Reporter: Ravindra Bajpai


hiveContext.emptyDataFrame.except(hiveContext.emptyDataFrame).count() yields 1 
against expected 0.

This was not the case with spark 1.5.2. This is an api change from usage point 
of view and hence I consider this as a bug. May be a boundary case, not sure.

Work around - For now I check the counts != 0 before this operation. Not good 
for performance. Hence creating a jira to track it.

As Young Zhang explained in reply to my mail - 
Starting from Spark 2, these kind of operation are implemented in left anti 
join, instead of using RDD operation directly.

Same issue also on sqlContext.

scala> spark.version
res25: String = 2.0.2

spark.sqlContext.emptyDataFrame.except(spark.sqlContext.emptyDataFrame).explain(true)
== Physical Plan ==
*HashAggregate(keys=[], functions=[], output=[])
+- Exchange SinglePartition
   +- *HashAggregate(keys=[], functions=[], output=[])
      +- BroadcastNestedLoopJoin BuildRight, LeftAnti, false
         :- Scan ExistingRDD[]
         +- BroadcastExchange IdentityBroadcastMode
            +- Scan ExistingRDD[]

This arguably means a bug. But my guess is liking the logic of comparing NULL = 
NULL, should it return true or false, causing this kind of confusion. 




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-20008) hiveContext.emptyDataFrame.except(hiveContext.emptyDataFrame).count() returns 1

Reply via email to