[jira] [Created] (SPARK-27282) Spark incorrect results when using UNION with GROUP BY clause

Sofia (JIRA) Tue, 26 Mar 2019 02:29:10 -0700

Sofia created SPARK-27282:
-----------------------------

             Summary: Spark incorrect results when using UNION with GROUP BY 
clause
                 Key: SPARK-27282
                 URL: https://issues.apache.org/jira/browse/SPARK-27282
             Project: Spark
          Issue Type: Bug
          Components: Spark Shell, Spark Submit, SQL
    Affects Versions: 2.3.2
         Environment: I'm using :


IntelliJ  IDEA ==> 2018.1.4

spark-sql and spark-core ==> 2.3.2.3.1.0.0-78 (for HDP 3.1)

scala ==> 2.11.8
            Reporter: Sofia


When using UNION clause after a GROUP BY clause in spark, the results obtained 
are wrong.

The following example explicit this issue:
{code:java}
CREATE TABLE test_un (
col1 varchar(255),
col2 varchar(255),
col3 varchar(255),
col4 varchar(255)
);

INSERT INTO test_un (col1, col2, col3, col4)
VALUES (1,1,2,4),
(1,1,2,4),
(1,1,3,5),
(2,2,2,null);
{code}
I used the following code :
{code:java}
val x = Toolkit.HiveToolkit.getDataFromHive("test","test_un")

val  y = x
   .filter(col("col4")isNotNull)
  .groupBy("col1", "col2","col3")
  .agg(count(col("col3")).alias("cnt"))
  .withColumn("col_name", lit("col3"))
  .select(col("col1"), col("col2"), 
col("col_name"),col("col3").alias("col_value"), col("cnt"))

val z = x
  .filter(col("col4")isNotNull)
  .groupBy("col1", "col2","col4")
  .agg(count(col("col4")).alias("cnt"))
  .withColumn("col_name", lit("col4"))
  .select(col("col1"), col("col2"), 
col("col_name"),col("col4").alias("col_value"), col("cnt"))

y.union(z).show()
{code}
 And i obtained the following results:
||col1||col2||col_name||col_value||cnt||
|1|1|col3|5|1|
|1|1|col3|4|2|
|1|1|col4|5|1|
|1|1|col4|4|2|

Expected results:
||col1||col2||col_name||col_value||cnt||
|1|1|col3|3|1|
|1|1|col3|2|2|
|1|1|col4|4|2|
|1|1|col4|5|1|

But when i remove the last row of the table, i obtain the correct results.
{code:java}
(2,2,2,null){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-27282) Spark incorrect results when using UNION with GROUP BY clause

Reply via email to