[
https://issues.apache.org/jira/browse/SPARK-13859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
JESSE CHEN closed SPARK-13859.
------------------------------
Resolution: Not A Bug
Fix Version/s: 2.0.0
Solution is to revert back to original TPC query with INTERSECT & EXCEPT and
validated with correct return results in Spark 2.0. The null-safe version will
remain a variant for this query (for Hive). internal toolkit defect open RTC
124749.
> TPCDS query 38 returns wrong results compared to TPC official result set
> -------------------------------------------------------------------------
>
> Key: SPARK-13859
> URL: https://issues.apache.org/jira/browse/SPARK-13859
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.6.0
> Reporter: JESSE CHEN
> Labels: tpcds-result-mismatch
> Fix For: 2.0.0
>
>
> Testing Spark SQL using TPC queries. Query 38 returns wrong results compared
> to official result set. This is at 1GB SF (validation run).
> SparkSQL returns count of 0, answer set reports 107.
> Actual results:
> {noformat}
> [0]
> {noformat}
> Expected:
> {noformat}
> +-----+
> | 1 |
> +-----+
> | 107 |
> +-----+
> {noformat}
> query used:
> {noformat}
> -- start query 38 in stream 0 using template query38.tpl and seed
> QUALIFICATION
> select count(*) from (
> select distinct c_last_name, c_first_name, d_date
> from store_sales
> JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk
> JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
> where d_month_seq between 1200 and 1200 + 11) tmp1
> JOIN
> (select distinct c_last_name, c_first_name, d_date
> from catalog_sales
> JOIN date_dim ON catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
> JOIN customer ON catalog_sales.cs_bill_customer_sk =
> customer.c_customer_sk
> where d_month_seq between 1200 and 1200 + 11) tmp2 ON (tmp1.c_last_name =
> tmp2.c_last_name) and (tmp1.c_first_name = tmp2.c_first_name) and
> (tmp1.d_date = tmp2.d_date)
> JOIN
> (
> select distinct c_last_name, c_first_name, d_date
> from web_sales
> JOIN date_dim ON web_sales.ws_sold_date_sk = date_dim.d_date_sk
> JOIN customer ON web_sales.ws_bill_customer_sk =
> customer.c_customer_sk
> where d_month_seq between 1200 and 1200 + 11) tmp3 ON (tmp1.c_last_name =
> tmp3.c_last_name) and (tmp1.c_first_name = tmp3.c_first_name) and
> (tmp1.d_date = tmp3.d_date)
> limit 100
> ;
> -- end query 38 in stream 0 using template query38.tpl
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]