peter.zhang created SPARK-4217:
----------------------------------
Summary: Result of SparkSQL is incorrect after a table join and
group by operation
Key: SPARK-4217
URL: https://issues.apache.org/jira/browse/SPARK-4217
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 1.1.0
Environment: Hadoop 2.2.0
Spark1.1
Reporter: peter.zhang
Priority: Critical
I runed a test using same SQL script in SparkSQL, Shark and Hive environment as
below
---------------------------------------------------------------
select c.theyear, sum(b.amount)
from tblstock a
join tblStockDetail b on a.ordernumber = b.ordernumber
join tbldate c on a.dateid = c.dateid
group by c.theyear;
result of hive/shark:
theyear _c1
2004 1403018
2005 5557850
2006 7203061
2007 11300432
2008 12109328
2009 5365447
2010 188944
result of SparkSQL:
2010 210924
2004 3265696
2005 13247234
2006 13670416
2007 16711974
2008 14670698
2009 6322137
I'll attach test data soon
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]