Ram Kandasamy created SPARK-11427:
-------------------------------------
Summary: DataFrame's intersect method does not work, returns 1
Key: SPARK-11427
URL: https://issues.apache.org/jira/browse/SPARK-11427
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 1.5.0
Reporter: Ram Kandasamy
Hello,
I was working with dataframes and I found the intersect() method seems to
always return '1'. The RDD's intersection() method does work properly.
Consider this example:
scala> val firstFile =
sqlContext.read.parquet("/Users/ramkandasamy/sparkData/2015-07-25/*").select("id").distinct
firstFile: org.apache.spark.sql.DataFrame = [id: string]
scala> firstFile.count
res4: Long = 1072046
scala> firstFile.intersect(firstFile).count
res5: Long = 1
scala> firstFile.rdd.intersection(firstFile.rdd).count
res6: Long = 1072046
I have tried various different cases, and for some reason, the dataframe's
intersect method always returns 1.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]