[ https://issues.apache.org/jira/browse/SPARK-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14197799#comment-14197799 ]
Venkata Ramana G commented on SPARK-4217: ----------------------------------------- I executed them on Hive 0.12 (from Hive command line) and Spark SQL latest master (from spark shell using Hive Context connecting to Hive0.12) > Result of SparkSQL is incorrect after a table join and group by operation > ------------------------------------------------------------------------- > > Key: SPARK-4217 > URL: https://issues.apache.org/jira/browse/SPARK-4217 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.1.0 > Environment: Hadoop 2.2.0 > Spark1.1 > Reporter: peter.zhang > Priority: Critical > Attachments: TestScript.sql, saledata.zip > > > I runed a test using same SQL script in SparkSQL, Shark and Hive > environment(Pure hive application rather than Spark HiveContext) as below > --------------------------------------------------------------- > select c.theyear, sum(b.amount) > from tblstock a > join tblStockDetail b on a.ordernumber = b.ordernumber > join tbldate c on a.dateid = c.dateid > group by c.theyear; > result of hive/shark: > theyear _c1 > 2004 1403018 > 2005 5557850 > 2006 7203061 > 2007 11300432 > 2008 12109328 > 2009 5365447 > 2010 188944 > result of SparkSQL: > 2010 210924 > 2004 3265696 > 2005 13247234 > 2006 13670416 > 2007 16711974 > 2008 14670698 > 2009 6322137 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org