Santiago M. Mola created SPARK-6743:
---------------------------------------
Summary: Join with empty projection on one side produces invalid
results
Key: SPARK-6743
URL: https://issues.apache.org/jira/browse/SPARK-6743
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 1.3.0
Reporter: Santiago M. Mola
{code:java}
val sqlContext = new SQLContext(sc)
val tab0 = sc.parallelize(Seq(
(83,0,38),
(26,0,79),
(43,81,24)
))
sqlContext.registerDataFrameAsTable(sqlContext.createDataFrame(tab0),
"tab0")
sqlContext.cacheTable("tab0")
val df1 = sqlContext.sql("SELECT tab0._2, cor0._2 FROM tab0, tab0 cor0 GROUP BY
tab0._2, cor0._2")
val result1 = df1.collect()
val df2 = sqlContext.sql("SELECT cor0._2 FROM tab0, tab0 cor0 GROUP BY cor0._2")
val result2 = df2.collect()
val df3 = sqlContext.sql("SELECT cor0._2 FROM tab0 cor0 GROUP BY cor0._2")
val result3 = df3.collect()
{code}
Given the previous code, result2 equals to Row(43), Row(83), Row(26), which is
wrong. These results correspond to cor0._1, instead of cor0._2. Correct results
would be Row(0), Row(81), which are ok for the third query. The first query
also produces valid results, and the only difference is that the left side of
the join is not empty.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]