Ram Kandasamy created SPARK-11427:
-------------------------------------

             Summary: DataFrame's intersect method does not work, returns 1
                 Key: SPARK-11427
                 URL: https://issues.apache.org/jira/browse/SPARK-11427
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.5.0
            Reporter: Ram Kandasamy


Hello,

    I was working with dataframes and I found the intersect() method seems to 
always return '1'. The RDD's intersection() method does work properly.

Consider this example:
scala> val firstFile = 
sqlContext.read.parquet("/Users/ramkandasamy/sparkData/2015-07-25/*").select("id").distinct
firstFile: org.apache.spark.sql.DataFrame = [id: string]

scala> firstFile.count
res4: Long = 1072046

scala> firstFile.intersect(firstFile).count
res5: Long = 1

scala> firstFile.rdd.intersection(firstFile.rdd).count
res6: Long = 1072046


I have tried various different cases, and for some reason, the dataframe's 
intersect method always returns 1. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to