[
https://issues.apache.org/jira/browse/SPARK-13863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200160#comment-15200160
]
Xin Wu commented on SPARK-13863:
--------------------------------
In terms of the ordering. the only difference is that the row with Null value
for the order by column (w_warehouse_name) is placed at the top for HIve and
Spark SQL, while the expected result has it at the bottom. Other rows are OK.
So the it seems the expected results have NULL row in the wrong place.
> TPCDS query 66 returns wrong results compared to TPC official result set
> -------------------------------------------------------------------------
>
> Key: SPARK-13863
> URL: https://issues.apache.org/jira/browse/SPARK-13863
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.6.0
> Reporter: JESSE CHEN
> Labels: tpcds-result-mismatch
>
> Testing Spark SQL using TPC queries. Query 66 returns wrong results compared
> to official result set. This is at 1GB SF (validation run).
> Aggregations slightly off -- eg. JAN_SALES column of "Doors canno" row -
> SparkSQL returns 6355232.185385704, expected 6355232.31
> Actual results:
> {noformat}
> [null,null,Fairview,Williamson County,TN,United
> States,DHL,BARIAN,2001,9597806.850651741,1.1121820530080795E7,8670867.81564045,8994785.945689201,1.088724806326294E7,1.4187671518377304E7,9732598.460139751,1.9798897020946026E7,2.1007842467959404E7,2.149551364927292E7,3.479566905774999E7,3.3122997954660416E7,null,null,null,null,null,null,null,null,null,null,null,null,2.1913594697555542E7,3.2518476414670944E7,2.48856624883976E7,2.5698343830046654E7,3.373591080598068E7,3.552703167087555E7,2.5465193481492043E7,5.362323870799959E7,5.1409986978201866E7,5.415917383586836E7,9.222704311805725E7,8.343539111531019E7]
> [Bad cards must make.,621234,Fairview,Williamson County,TN,United
> States,DHL,BARIAN,2001,9506753.593884468,8008140.429557085,6116769.711647987,1.1973045160133362E7,7756254.925520897,5352978.574095726,1.373399613500309E7,1.6418794411203384E7,1.7212743279764652E7,1.704270732417488E7,3.43049358570323E7,3.532416421229005E7,15.30301560102066,12.890698882477594,9.846160563729589,19.273003667109915,12.485238936569628,8.61668642427125,22.107605403121994,26.429323590150222,27.707342611261865,27.433635834765774,55.22063482847413,56.86128610521969,3.0534943928382874E7,2.4481686250203133E7,2.217871080008793E7,2.569579825610423E7,2.995490355044937E7,1.8084140250833035E7,3.0805576178061485E7,4.7156887432252884E7,5.115858869637826E7,5.5759943171424866E7,8.625354428184557E7,8.345155532035494E7]
> [Conventional childr,977787,Fairview,Williamson County,TN,United
> States,DHL,BARIAN,2001,8860645.460736752,1.441581376543355E7,6761497.232810497,1.1820654735879421E7,8246260.600341797,6636877.482845306,1.1434492123092413E7,2.5673812070380323E7,2.3074206999911785E7,2.1834582007320404E7,2.6894900596512794E7,3.357509177109933E7,9.061938296108202,14.743306840276613,6.9151024024767125,12.08919195681618,8.43359606984118,6.787651587559771,11.694256645969329,26.257060147435304,23.598398219562938,22.330611889215547,27.505888906799534,34.337838170377935,2.3836085704864502E7,3.20733132298584E7,2.503790437837982E7,2.2659895963564873E7,2.175740087420273E7,2.4451608012176514E7,2.1933001734852314E7,5.59967034604629E7,5.737188052299309E7,6.208721474336243E7,8.284991027382469E7,8.897031933202875E7]
> [Doors canno,294242,Fairview,Williamson County,TN,United
> States,DHL,BARIAN,2001,6355232.185385704,1.0198920296742141E7,1.0246200903741479E7,1.2209716492156029E7,8566998.262890816,8806316.75278151,9789405.6993227,1.646658496404171E7,2.6443785668474197E7,2.701604788320923E7,3.366058958298761E7,2.7462468750599384E7,21.59865751791282,34.66167405313361,34.822360178837414,41.495491779406166,29.115484067165177,29.928823053070296,33.26991285854059,55.96272783641258,89.87087386734116,91.81574310672585,114.39763726112386,93.33293258813964,2.2645142994330406E7,2.448725452685547E7,2.4925759290207863E7,3.0503655031727314E7,2.6558160276379585E7,2.0976233452690125E7,2.9895796101181984E7,5.600219855566597E7,5.348815865275085E7,7.628723580410767E7,8.248374754962921E7,8.808826726185608E7]
> [Important issues liv,138504,Fairview,Williamson County,TN,United
> States,DHL,BARIAN,2001,1.1748784594717264E7,1.435130566355586E7,9896470.867572784,7990874.805492401,8879247.840401173,7362383.04259038,1.0011144724414349E7,1.7741201390372872E7,2.1346976135887742E7,1.8074978020030975E7,2.967512567988676E7,3.2545325348875403E7,84.8263197793368,103.6165429414014,71.45259969078715,57.694180713137534,64.10824120892663,53.156465102743454,72.28054586448297,128.09161750110374,154.12534032149065,130.5014874662896,214.25464737398747,234.97751219369408,2.7204167203903973E7,2.598037822457385E7,1.9943398915802002E7,2.5710421112384796E7,1.948448105346489E7,2.6346611484448195E7,2.5075158296625137E7,5.409477817043829E7,4.106673223178029E7,5.454705814340496E7,7.246596285337901E7,9.277032812079096E7]
> {noformat}
> Expected results:
> {noformat}
> +----------------------+-------------------+----------+-------------------+---------+---------------+---------------+------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+
> | W_WAREHOUSE_NAME | W_WAREHOUSE_SQ_FT | W_CITY | W_COUNTY |
> W_STATE | W_COUNTRY | SHIP_CARRIERS | YEAR | JAN_SALES |
> FEB_SALES | MAR_SALES | APR_SALES | MAY_SALES | JUN_SALES
> | JUL_SALES | AUG_SALES | SEP_SALES | OCT_SALES |
> NOV_SALES | DEC_SALES | JAN_SALES_PER_SQ_FOOT | FEB_SALES_PER_SQ_FOOT |
> MAR_SALES_PER_SQ_FOOT | APR_SALES_PER_SQ_FOOT | MAY_SALES_PER_SQ_FOOT |
> JUN_SALES_PER_SQ_FOOT | JUL_SALES_PER_SQ_FOOT | AUG_SALES_PER_SQ_FOOT |
> SEP_SALES_PER_SQ_FOOT | OCT_SALES_PER_SQ_FOOT | NOV_SALES_PER_SQ_FOOT |
> DEC_SALES_PER_SQ_FOOT | JAN_NET | FEB_NET | MAR_NET |
> APR_NET | MAY_NET | JUN_NET | JUL_NET |
> AUG_NET | SEP_NET | OCT_NET | NOV_NET | DEC_NET |
> +----------------------+-------------------+----------+-------------------+---------+---------------+---------------+------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+
> | Bad cards must make. | 621234 | Fairview | Williamson County |
> TN | United States | DHL,BARIAN | 2001 | 9506753.46000 |
> 8008140.33000 | 6116769.63000 | 11973045.15000 | 7756254.92000 |
> 5352978.49000 | 13733996.10000 | 16418794.37000 | 17212743.32000 |
> 17042707.41000 | 34304935.61000 | 35324164.21000 | 15.30302 |
> 12.89070 | 9.84616 | 19.27300 |
> 12.48524 | 8.61669 | 22.10761 |
> 26.42932 | 27.70734 | 27.43364 |
> 55.22063 | 56.86129 | 30534943.77000 | 24481685.94000 |
> 22178710.81000 | 25695798.18000 | 29954903.78000 | 18084140.05000 |
> 30805576.13000 | 47156887.22000 | 51158588.86000 | 55759942.80000 |
> 86253544.16000 | 83451555.63000 |
> | Conventional childr | 977787 | Fairview | Williamson County |
> TN | United States | DHL,BARIAN | 2001 | 8860645.55000 |
> 14415813.74000 | 6761497.23000 | 11820654.76000 | 8246260.69000 |
> 6636877.49000 | 11434492.25000 | 25673812.14000 | 23074206.96000 |
> 21834581.94000 | 26894900.53000 | 33575091.74000 | 9.06194 |
> 14.74331 | 6.91510 | 12.08919 |
> 8.43360 | 6.78765 | 11.69426 |
> 26.25706 | 23.59840 | 22.33061 |
> 27.50589 | 34.33784 | 23836085.83000 | 32073313.37000 |
> 25037904.18000 | 22659895.86000 | 21757401.03000 | 24451608.10000 |
> 21933001.85000 | 55996703.43000 | 57371880.44000 | 62087214.51000 |
> 82849910.15000 | 88970319.31000 |
> | Doors canno | 294242 | Fairview | Williamson County |
> TN | United States | DHL,BARIAN | 2001 | 6355232.31000 |
> 10198920.36000 | 10246200.97000 | 12209716.50000 | 8566998.28000 |
> 8806316.81000 | 9789405.60000 | 16466584.88000 | 26443785.61000 |
> 27016047.80000 | 33660589.67000 | 27462468.62000 | 21.59866 |
> 34.66167 | 34.82236 | 41.49549 |
> 29.11548 | 29.92882 | 33.26991 |
> 55.96273 | 89.87087 | 91.81574 |
> 114.39764 | 93.33293 | 22645143.09000 | 24487254.60000 |
> 24925759.42000 | 30503655.27000 | 26558160.29000 | 20976233.52000 |
> 29895796.09000 | 56002198.38000 | 53488158.53000 | 76287235.46000 |
> 82483747.59000 | 88088266.69000 |
> | Important issues liv | 138504 | Fairview | Williamson County |
> TN | United States | DHL,BARIAN | 2001 | 11748784.55000 |
> 14351305.77000 | 9896470.93000 | 7990874.78000 | 8879247.90000 |
> 7362383.09000 | 10011144.75000 | 17741201.32000 | 21346976.05000 |
> 18074978.16000 | 29675125.64000 | 32545325.29000 | 84.82632 |
> 103.61654 | 71.45260 | 57.69418 |
> 64.10824 | 53.15647 | 72.28055 |
> 128.09162 | 154.12534 | 130.50149 |
> 214.25465 | 234.97751 | 27204167.15000 | 25980378.13000 |
> 19943398.93000 | 25710421.13000 | 19484481.03000 | 26346611.48000 |
> 25075158.43000 | 54094778.13000 | 41066732.11000 | 54547058.28000 |
> 72465962.92000 | 92770328.27000 |
> | [NULL] | [NULL] | Fairview | Williamson County |
> TN | United States | DHL,BARIAN | 2001 | 9597806.95000 |
> 11121820.57000 | 8670867.91000 | 8994786.04000 | 10887248.09000 |
> 14187671.36000 | 9732598.41000 | 19798897.07000 | 21007842.34000 |
> 21495513.67000 | 34795669.17000 | 33122997.94000 | [NULL] |
> [NULL] | [NULL] | [NULL] |
> [NULL] | [NULL] | [NULL] |
> [NULL] | [NULL] | [NULL] |
> [NULL] | [NULL] | 21913594.59000 | 32518476.51000 |
> 24885662.72000 | 25698343.86000 | 33735910.61000 | 35527031.58000 |
> 25465193.48000 | 53623238.66000 | 51409986.76000 | 54159173.90000 |
> 92227043.25000 | 83435390.84000 |
> +----------------------+-------------------+----------+-------------------+---------+---------------+---------------+------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+----------------+
> {noformat}
> Query used:
> {noformat}
> -- start query 66 in stream 0 using template query66.tpl and seed
> QUALIFICATION
> select
> w_warehouse_name
> ,w_warehouse_sq_ft
> ,w_city
> ,w_county
> ,w_state
> ,w_country
> ,ship_carriers
> ,year
> ,sum(jan_sales) as jan_sales
> ,sum(feb_sales) as feb_sales
> ,sum(mar_sales) as mar_sales
> ,sum(apr_sales) as apr_sales
> ,sum(may_sales) as may_sales
> ,sum(jun_sales) as jun_sales
> ,sum(jul_sales) as jul_sales
> ,sum(aug_sales) as aug_sales
> ,sum(sep_sales) as sep_sales
> ,sum(oct_sales) as oct_sales
> ,sum(nov_sales) as nov_sales
> ,sum(dec_sales) as dec_sales
> ,sum(jan_sales/w_warehouse_sq_ft) as jan_sales_per_sq_foot
> ,sum(feb_sales/w_warehouse_sq_ft) as feb_sales_per_sq_foot
> ,sum(mar_sales/w_warehouse_sq_ft) as mar_sales_per_sq_foot
> ,sum(apr_sales/w_warehouse_sq_ft) as apr_sales_per_sq_foot
> ,sum(may_sales/w_warehouse_sq_ft) as may_sales_per_sq_foot
> ,sum(jun_sales/w_warehouse_sq_ft) as jun_sales_per_sq_foot
> ,sum(jul_sales/w_warehouse_sq_ft) as jul_sales_per_sq_foot
> ,sum(aug_sales/w_warehouse_sq_ft) as aug_sales_per_sq_foot
> ,sum(sep_sales/w_warehouse_sq_ft) as sep_sales_per_sq_foot
> ,sum(oct_sales/w_warehouse_sq_ft) as oct_sales_per_sq_foot
> ,sum(nov_sales/w_warehouse_sq_ft) as nov_sales_per_sq_foot
> ,sum(dec_sales/w_warehouse_sq_ft) as dec_sales_per_sq_foot
> ,sum(jan_net) as jan_net
> ,sum(feb_net) as feb_net
> ,sum(mar_net) as mar_net
> ,sum(apr_net) as apr_net
> ,sum(may_net) as may_net
> ,sum(jun_net) as jun_net
> ,sum(jul_net) as jul_net
> ,sum(aug_net) as aug_net
> ,sum(sep_net) as sep_net
> ,sum(oct_net) as oct_net
> ,sum(nov_net) as nov_net
> ,sum(dec_net) as dec_net
> from (
> select
> w_warehouse_name
> ,w_warehouse_sq_ft
> ,w_city
> ,w_county
> ,w_state
> ,w_country
> ,concat('DHL', ',', 'BARIAN') as ship_carriers
> ,d_year as year
> ,sum(case when d_moy = 1
> then ws_ext_sales_price* ws_quantity else 0 end) as jan_sales
> ,sum(case when d_moy = 2
> then ws_ext_sales_price* ws_quantity else 0 end) as feb_sales
> ,sum(case when d_moy = 3
> then ws_ext_sales_price* ws_quantity else 0 end) as mar_sales
> ,sum(case when d_moy = 4
> then ws_ext_sales_price* ws_quantity else 0 end) as apr_sales
> ,sum(case when d_moy = 5
> then ws_ext_sales_price* ws_quantity else 0 end) as may_sales
> ,sum(case when d_moy = 6
> then ws_ext_sales_price* ws_quantity else 0 end) as jun_sales
> ,sum(case when d_moy = 7
> then ws_ext_sales_price* ws_quantity else 0 end) as jul_sales
> ,sum(case when d_moy = 8
> then ws_ext_sales_price* ws_quantity else 0 end) as aug_sales
> ,sum(case when d_moy = 9
> then ws_ext_sales_price* ws_quantity else 0 end) as sep_sales
> ,sum(case when d_moy = 10
> then ws_ext_sales_price* ws_quantity else 0 end) as oct_sales
> ,sum(case when d_moy = 11
> then ws_ext_sales_price* ws_quantity else 0 end) as nov_sales
> ,sum(case when d_moy = 12
> then ws_ext_sales_price* ws_quantity else 0 end) as dec_sales
> ,sum(case when d_moy = 1
> then ws_net_paid * ws_quantity else 0 end) as jan_net
> ,sum(case when d_moy = 2
> then ws_net_paid * ws_quantity else 0 end) as feb_net
> ,sum(case when d_moy = 3
> then ws_net_paid * ws_quantity else 0 end) as mar_net
> ,sum(case when d_moy = 4
> then ws_net_paid * ws_quantity else 0 end) as apr_net
> ,sum(case when d_moy = 5
> then ws_net_paid * ws_quantity else 0 end) as may_net
> ,sum(case when d_moy = 6
> then ws_net_paid * ws_quantity else 0 end) as jun_net
> ,sum(case when d_moy = 7
> then ws_net_paid * ws_quantity else 0 end) as jul_net
> ,sum(case when d_moy = 8
> then ws_net_paid * ws_quantity else 0 end) as aug_net
> ,sum(case when d_moy = 9
> then ws_net_paid * ws_quantity else 0 end) as sep_net
> ,sum(case when d_moy = 10
> then ws_net_paid * ws_quantity else 0 end) as oct_net
> ,sum(case when d_moy = 11
> then ws_net_paid * ws_quantity else 0 end) as nov_net
> ,sum(case when d_moy = 12
> then ws_net_paid * ws_quantity else 0 end) as dec_net
> from
> web_sales
> ,warehouse
> ,date_dim
> ,time_dim
> ,ship_mode
> where
> ws_warehouse_sk = w_warehouse_sk
> and ws_sold_date_sk = d_date_sk
> and ws_sold_time_sk = t_time_sk
> and ws_ship_mode_sk = sm_ship_mode_sk
> and d_year = 2001
> and t_time between 30838 and 30838+28800
> and sm_carrier in ('DHL','BARIAN')
> group by
> w_warehouse_name
> ,w_warehouse_sq_ft
> ,w_city
> ,w_county
> ,w_state
> ,w_country
> ,d_year
> union all
> select
> w_warehouse_name
> ,w_warehouse_sq_ft
> ,w_city
> ,w_county
> ,w_state
> ,w_country
> ,concat('DHL', ',', 'BARIAN') as ship_carriers
> ,d_year as year
> ,sum(case when d_moy = 1
> then cs_sales_price* cs_quantity else 0 end) as jan_sales
> ,sum(case when d_moy = 2
> then cs_sales_price* cs_quantity else 0 end) as feb_sales
> ,sum(case when d_moy = 3
> then cs_sales_price* cs_quantity else 0 end) as mar_sales
> ,sum(case when d_moy = 4
> then cs_sales_price* cs_quantity else 0 end) as apr_sales
> ,sum(case when d_moy = 5
> then cs_sales_price* cs_quantity else 0 end) as may_sales
> ,sum(case when d_moy = 6
> then cs_sales_price* cs_quantity else 0 end) as jun_sales
> ,sum(case when d_moy = 7
> then cs_sales_price* cs_quantity else 0 end) as jul_sales
> ,sum(case when d_moy = 8
> then cs_sales_price* cs_quantity else 0 end) as aug_sales
> ,sum(case when d_moy = 9
> then cs_sales_price* cs_quantity else 0 end) as sep_sales
> ,sum(case when d_moy = 10
> then cs_sales_price* cs_quantity else 0 end) as oct_sales
> ,sum(case when d_moy = 11
> then cs_sales_price* cs_quantity else 0 end) as nov_sales
> ,sum(case when d_moy = 12
> then cs_sales_price* cs_quantity else 0 end) as dec_sales
> ,sum(case when d_moy = 1
> then cs_net_paid_inc_tax * cs_quantity else 0 end) as jan_net
> ,sum(case when d_moy = 2
> then cs_net_paid_inc_tax * cs_quantity else 0 end) as feb_net
> ,sum(case when d_moy = 3
> then cs_net_paid_inc_tax * cs_quantity else 0 end) as mar_net
> ,sum(case when d_moy = 4
> then cs_net_paid_inc_tax * cs_quantity else 0 end) as apr_net
> ,sum(case when d_moy = 5
> then cs_net_paid_inc_tax * cs_quantity else 0 end) as may_net
> ,sum(case when d_moy = 6
> then cs_net_paid_inc_tax * cs_quantity else 0 end) as jun_net
> ,sum(case when d_moy = 7
> then cs_net_paid_inc_tax * cs_quantity else 0 end) as jul_net
> ,sum(case when d_moy = 8
> then cs_net_paid_inc_tax * cs_quantity else 0 end) as aug_net
> ,sum(case when d_moy = 9
> then cs_net_paid_inc_tax * cs_quantity else 0 end) as sep_net
> ,sum(case when d_moy = 10
> then cs_net_paid_inc_tax * cs_quantity else 0 end) as oct_net
> ,sum(case when d_moy = 11
> then cs_net_paid_inc_tax * cs_quantity else 0 end) as nov_net
> ,sum(case when d_moy = 12
> then cs_net_paid_inc_tax * cs_quantity else 0 end) as dec_net
> from
> catalog_sales
> ,warehouse
> ,date_dim
> ,time_dim
> ,ship_mode
> where
> cs_warehouse_sk = w_warehouse_sk
> and cs_sold_date_sk = d_date_sk
> and cs_sold_time_sk = t_time_sk
> and cs_ship_mode_sk = sm_ship_mode_sk
> and d_year = 2001
> and t_time between 30838 and 30838+28800
> and sm_carrier in ('DHL','BARIAN')
> group by
> w_warehouse_name
> ,w_warehouse_sq_ft
> ,w_city
> ,w_county
> ,w_state
> ,w_country
> ,d_year
> ) x
> group by
> w_warehouse_name
> ,w_warehouse_sq_ft
> ,w_city
> ,w_county
> ,w_state
> ,w_country
> ,ship_carriers
> ,year
> order by w_warehouse_name
> limit 100;
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]