[ https://issues.apache.org/jira/browse/SPARK-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
XiaodongCui updated SPARK-19102: -------------------------------- Description: the problem is the result of the code blow that the second column's value is not the same.the second sql result is 10000 times bigger than the first sql result.the bug is only reappear in the format like sum(a * b),count (distinct c) DataFrame df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); df1.registerTempTable("hd_salesflat"); DataFrame cube5 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); DataFrame cube6 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM hd_salesflat GROUP BY areacode1"); cube5.show(50); cube6.show(50); was: the problem is the result of the code blow that the second column's value is not the same.the second sql result is 10000 times bigger than the first sql result.the bug is only reappear in the format like sum(a * b),count (distinct c) DataFrame df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); df1.registerTempTable("hd_salesflat"); DataFrame cube5 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); DataFrame cube6 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM hd_salesflat GROUP BY areacode1"); cube5.show(50); cube6.show(50); my data : | transno| lineno|productid|netamount|netamountoperation|serviceamount|quantity|unitprice|taxamount|discountamount|discountamountoperation|saleshour|businessdate|salesdate|week|holidayname|holidayid|financialyear|financialmonth| dateticket|calendaryear|calendarmonth|calendarmonthchr|memberno|salestype|covers|grossamountticket|netamountticket|netamountoperationticket|points|discountamountticket|discountamounroperationticket|serviceamountticket|invoicecount|taxamountticket| shopno|shopid|tableno|areacode1|areaname1|areacode2|areaname2|areacode3|areaname3|areacode4|areaname4| orgno|orgtype| hdsino|shopname| shopenname|shopbrname|commercial1|com1name|commercial2| com2name|shoptype1|shoptype1name|shoptype2|shoptype2name|taxtype|floorlocation| m2|deliverareano|deliverareaname|parentorgno|cityno|country|menutype|menutypename|costcenterno|costcentername|pricearea|priceareaname| opendate|openyear|shopcategory|timeperiod| closedate| sapshopno|cg5no|countryname|province|provincename|cityname|countrycode|categoryno|categoryname|categoryno2|categoryname2|categoryno3|categoryname3|categoryno4|categoryname4|productno|productname|productenname|salesprice|vouchertype| startdate| enddate|flavor|basicunit|discountno|discountdetailamountoperation|disdesctiption|promotionno|salestag|salestagname|usertype|usertypevalue|usercd| grossavg| netoperationavg| netavg|dineincount|dayamttotal|daynetamttotal|daynetamtopttotal|daytctotal|tablecount| +--------+---------+---------+---------+------------------+-------------+--------+---------+---------+--------------+-----------------------+---------+------------+---------+----+-----------+---------+-------------+--------------+--------------------+------------+-------------+----------------+--------+---------+------+-----------------+---------------+------------------------+------+--------------------+-----------------------------+-------------------+------------+---------------+--------+------+-------+---------+---------+---------+---------+---------+---------+---------+---------+--------+-------+--------+--------+-------------+----------+-----------+--------+-----------+-----------+---------+-------------+---------+-------------+-------+-------------+----------+-------------+---------------+-----------+------+-------+--------+------------+------------+--------------+---------+-------------+--------------------+--------+------------+----------+--------------------+----------+-----+-----------+--------+------------+--------+-----------+----------+------------+-----------+-------------+-----------+-------------+-----------+-------------+---------+-----------+-------------+----------+-----------+--------------------+--------------------+------+---------+----------+-----------------------------+--------------+-----------+--------+------------+--------+-------------+------+----------------+------------------+------------------+-----------+-----------+--------------+-----------------+----------+----------+ |76317828|121082663| 1392| 25.0000| 25.0000| null| 1.0000| 25.0000| 1.4200| 0.0000| 0.0000| 5| 20160920| 20160920| Tue| | null| 2017| 4|2016-09-20 17:03:...| 2016| 9| Sep| 1329651| SALE| 1| 25.0000| 25.0000| 25.0000| null| 0.0000| 0.0000| 0.0000| 0| 1.4200|CNSHA006| 202| | HDCN| 哈根达斯中国| CN01| 大华东区| CN0001| 上海大区| CN000001| 上海1区|CNSHA006| 1|HDAS0251| 上海南东店|NAN DONG SHOP| SHND| 01| 市级商业中心| 2|High Street| 1| Flagship| 1| 无户外| 10| 1|298.000000| DL0141| 上海A天天| CN000001| SHA| CN| 1| Full| CN8X| 上海本地| MK0004| 新外带菜单价格区域|2003-10-01 00:00:...| -1| old| Morning|9999-12-31 00:00:...|C_CN8XNA04| BBR| 中国| SH| 上海| 上海| CN| 1| Sales| 101| Products| 10104| Coffee| C006| 纸杯咖啡| null| 美式咖啡TK| null| null| null|2013-01-01 00:00:...|2020-11-12 00:00:...| null| BL| null| null| null| null| 40| 月饼及月饼券| null| null| null|25.0000000000000|25.000000000000000|25.000000000000000| null|840000.0000| 84.0000| 84.0000| 1| 53| +--------+---------+---------+---------+------------------+-------------+--------+---------+---------+--------------+-----------------------+---------+------------+---------+----+-----------+---------+-------------+--------------+--------------------+------------+-------------+----------------+--------+---------+------+-----------------+---------------+------------------------+------+--------------------+-----------------------------+-------------------+------------+---------------+--------+------+-------+---------+---------+---------+---------+---------+---------+---------+---------+--------+-------+--------+--------+-------------+----------+-----------+--------+-----------+-----------+---------+-------------+---------+-------------+-------+-------------+----------+-------------+---------------+-----------+------+-------+--------+------------+------------+--------------+---------+-------------+--------------------+--------+------------+----------+--------------------+----------+-----+-----------+--------+------------+--------+-----------+----------+------------+-----------+-------------+-----------+-------------+-----------+-------------+---------+-----------+-------------+----------+-----------+--------------------+--------------------+------+---------+----------+-----------------------------+--------------+-----------+--------+------------+--------+-------------+------+----------------+------------------+------------------+-----------+-----------+--------------+-----------------+----------+----------+ > Accuracy error of spark SQL results > ----------------------------------- > > Key: SPARK-19102 > URL: https://issues.apache.org/jira/browse/SPARK-19102 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL > Affects Versions: 1.6.0, 1.6.1 > Environment: Spark 1.6.0, Hadoop 2.6.0,JDK 1.8,CentOS6.6 > Reporter: XiaodongCui > > the problem is the result of the code blow that the second column's value > is not the same.the second sql result is 10000 times bigger than the first > sql result.the bug is only reappear in the format like sum(a * b),count > (distinct c) > DataFrame > df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); > df1.registerTempTable("hd_salesflat"); > DataFrame cube5 = sqlContext.sql("SELECT areacode1, > SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); > DataFrame cube6 = sqlContext.sql("SELECT areacode1, > SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM > hd_salesflat GROUP BY areacode1"); > cube5.show(50); > cube6.show(50); -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org