StanZhai created SPARK-19766:
--------------------------------
Summary: INNER JOIN on constant alias columns returns incorrect
results
Key: SPARK-19766
URL: https://issues.apache.org/jira/browse/SPARK-19766
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.1.0
Reporter: StanZhai
Priority: Critical
We can demonstrate the problem with the following data set and query:
{code}
val spark = SparkSession.builder().appName("test").master("local").getOrCreate()
val sql1 =
"""
|create temporary view t1 as select * from values
|(1)
|as grouping(a)
""".stripMargin
val sql2 =
"""
|create temporary view t2 as select * from values
|(1)
|as grouping(a)
""".stripMargin
val sql3 =
"""
|create temporary view t3 as select * from values
|(1),
|(1)
|as grouping(a)
""".stripMargin
val sql4 =
"""
|create temporary view t4 as select * from values
|(1),
|(1)
|as grouping(a)
""".stripMargin
val sqlA =
"""
|create temporary view ta as
|select a, 'a' as tag from t1 union all
|select a, 'b' as tag from t2
""".stripMargin
val sqlB =
"""
|create temporary view tb as
|select a, 'a' as tag from t3 union all
|select a, 'b' as tag from t4
""".stripMargin
val sql =
"""
|select tb.* from ta inner join tb on
|ta.a = tb.a and
|ta.tag = tb.tag
""".stripMargin
spark.sql(sql1)
spark.sql(sql2)
spark.sql(sql3)
spark.sql(sql4)
spark.sql(sqlA)
spark.sql(sqlB)
spark.sql(sql).show()
{code}
The results which is incorrect:
{code}
+---+---+
| a|tag|
+---+---+
| 1| b|
| 1| b|
| 1| a|
| 1| a|
| 1| b|
| 1| b|
| 1| a|
| 1| a|
+---+---+
{code}
The correct results should be:
{code}
+---+---+
| a|tag|
+---+---+
| 1| a|
| 1| a|
| 1| b|
| 1| b|
+---+---+
{code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]