JESSE CHEN created SPARK-15372:
----------------------------------
Summary: TPC-DS Qury 84 returns wrong results against TPC official
Key: SPARK-15372
URL: https://issues.apache.org/jira/browse/SPARK-15372
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.0.0
Reporter: JESSE CHEN
Assignee: Herman van Hovell
Priority: Critical
Fix For: 2.0.0
The official TPC-DS query 41 fails with the following error:
{noformat}
Error in query: The correlated scalar subquery can only contain equality
predicates: (((i_manufact#38 = i_manufact#16) && (((((i_category#36 = Women) &&
((i_color#41 = powder) || (i_color#41 = khaki))) && (((i_units#42 = Ounce) ||
(i_units#42 = Oz)) && ((i_size#39 = medium) || (i_size#39 = extra large)))) ||
(((i_category#36 = Women) && ((i_color#41 = brown) || (i_color#41 = honeydew)))
&& (((i_units#42 = Bunch) || (i_units#42 = Ton)) && ((i_size#39 = N/A) ||
(i_size#39 = small))))) || ((((i_category#36 = Men) && ((i_color#41 = floral)
|| (i_color#41 = deep))) && (((i_units#42 = N/A) || (i_units#42 = Dozen)) &&
((i_size#39 = petite) || (i_size#39 = large)))) || (((i_category#36 = Men) &&
((i_color#41 = light) || (i_color#41 = cornflower))) && (((i_units#42 = Box) ||
(i_units#42 = Pound)) && ((i_size#39 = medium) || (i_size#39 = extra
large))))))) || ((i_manufact#38 = i_manufact#16) && (((((i_category#36 = Women)
&& ((i_color#41 = midnight) || (i_color#41 = snow))) && (((i_units#42 = Pallet)
|| (i_units#42 = Gross)) && ((i_size#39 = medium) || (i_size#39 = extra
large)))) || (((i_category#36 = Women) && ((i_color#41 = cyan) || (i_color#41 =
papaya))) && (((i_units#42 = Cup) || (i_units#42 = Dram)) && ((i_size#39 = N/A)
|| (i_size#39 = small))))) || ((((i_category#36 = Men) && ((i_color#41 =
orange) || (i_color#41 = frosted))) && (((i_units#42 = Each) || (i_units#42 =
Tbl)) && ((i_size#39 = petite) || (i_size#39 = large)))) || (((i_category#36 =
Men) && ((i_color#41 = forest) || (i_color#41 = ghost))) && (((i_units#42 = Lb)
|| (i_units#42 = Bundle)) && ((i_size#39 = medium) || (i_size#39 = extra
large))))))));
{noformat}
The output plans showed the following errors
{noformat}
== Parsed Logical Plan ==
'GlobalLimit 100
+- 'LocalLimit 100
+- 'Sort ['i_product_name ASC], true
+- 'Distinct
+- 'Project ['i_product_name]
+- 'Filter ((('i_manufact_id >= 738) && ('i_manufact_id <= (738 +
40))) && (scalar-subquery#1 [] > 0))
: +- 'SubqueryAlias scalar-subquery#1 []
: +- 'Project ['count(1) AS item_cnt#0]
: +- 'Filter ((('i_manufact = 'i1.i_manufact) &&
((((('i_category = Women) && (('i_color = powder) || ('i_color = khaki))) &&
((('i_units = Ounce) || ('i_units = Oz)) && (('i_size = medium) || ('i_size =
extra large)))) || ((('i_category = Women) && (('i_color = brown) || ('i_color
= honeydew))) && ((('i_units = Bunch) || ('i_units = Ton)) && (('i_size = N/A)
|| ('i_size = small))))) || (((('i_category = Men) && (('i_color = floral) ||
('i_color = deep))) && ((('i_units = N/A) || ('i_units = Dozen)) && (('i_size =
petite) || ('i_size = large)))) || ((('i_category = Men) && (('i_color = light)
|| ('i_color = cornflower))) && ((('i_units = Box) || ('i_units = Pound)) &&
(('i_size = medium) || ('i_size = extra large))))))) || (('i_manufact =
'i1.i_manufact) && ((((('i_category = Women) && (('i_color = midnight) ||
('i_color = snow))) && ((('i_units = Pallet) || ('i_units = Gross)) &&
(('i_size = medium) || ('i_size = extra large)))) || ((('i_category = Women) &&
(('i_color = cyan) || ('i_color = papaya))) && ((('i_units = Cup) || ('i_units
= Dram)) && (('i_size = N/A) || ('i_size = small))))) || (((('i_category = Men)
&& (('i_color = orange) || ('i_color = frosted))) && ((('i_units = Each) ||
('i_units = Tbl)) && (('i_size = petite) || ('i_size = large)))) ||
((('i_category = Men) && (('i_color = forest) || ('i_color = ghost))) &&
((('i_units = Lb) || ('i_units = Bundle)) && (('i_size = medium) || ('i_size =
extra large))))))))
: +- 'UnresolvedRelation `item`, None
+- 'UnresolvedRelation `item`, Some(i1)
== Analyzed Logical Plan ==
i_product_name: string
GlobalLimit 100
+- LocalLimit 100
+- Sort [i_product_name#24 ASC], true
+- Distinct
+- Project [i_product_name#24]
+- Filter (((i_manufact_id#16L >= cast(738 as bigint)) &&
(i_manufact_id#16L <= cast((738 + 40) as bigint))) && (scalar-subquery#1
[(((i_manufact#39 = i_manufact#17) && (((((i_category#37 = Women) &&
((i_color#42 = powder) || (i_color#42 = khaki))) && (((i_units#43 = Ounce) ||
(i_units#43 = Oz)) && ((i_size#40 = medium) || (i_size#40 = extra large)))) ||
(((i_category#37 = Women) && ((i_color#42 = brown) || (i_color#42 = honeydew)))
&& (((i_units#43 = Bunch) || (i_units#43 = Ton)) && ((i_size#40 = N/A) ||
(i_size#40 = small))))) || ((((i_category#37 = Men) && ((i_color#42 = floral)
|| (i_color#42 = deep))) && (((i_units#43 = N/A) || (i_units#43 = Dozen)) &&
((i_size#40 = petite) || (i_size#40 = large)))) || (((i_category#37 = Men) &&
((i_color#42 = light) || (i_color#42 = cornflower))) && (((i_units#43 = Box) ||
(i_units#43 = Pound)) && ((i_size#40 = medium) || (i_size#40 = extra
large))))))) || ((i_manufact#39 = i_manufact#17) && (((((i_category#37 = Women)
&& ((i_color#42 = midnight) || (i_color#42 = snow))) && (((i_units#43 = Pallet)
|| (i_units#43 = Gross)) && ((i_size#40 = medium) || (i_size#40 = extra
large)))) || (((i_category#37 = Women) && ((i_color#42 = cyan) || (i_color#42 =
papaya))) && (((i_units#43 = Cup) || (i_units#43 = Dram)) && ((i_size#40 = N/A)
|| (i_size#40 = small))))) || ((((i_category#37 = Men) && ((i_color#42 =
orange) || (i_color#42 = frosted))) && (((i_units#43 = Each) || (i_units#43 =
Tbl)) && ((i_size#40 = petite) || (i_size#40 = large)))) || (((i_category#37 =
Men) && ((i_color#42 = forest) || (i_color#42 = ghost))) && (((i_units#43 = Lb)
|| (i_units#43 = Bundle)) && ((i_size#40 = medium) || (i_size#40 = extra
large))))))))] > cast(0 as bigint)))
: +- SubqueryAlias scalar-subquery#1 [(((i_manufact#39 =
i_manufact#17) && (((((i_category#37 = Women) && ((i_color#42 = powder) ||
(i_color#42 = khaki))) && (((i_units#43 = Ounce) || (i_units#43 = Oz)) &&
((i_size#40 = medium) || (i_size#40 = extra large)))) || (((i_category#37 =
Women) && ((i_color#42 = brown) || (i_color#42 = honeydew))) && (((i_units#43 =
Bunch) || (i_units#43 = Ton)) && ((i_size#40 = N/A) || (i_size#40 = small)))))
|| ((((i_category#37 = Men) && ((i_color#42 = floral) || (i_color#42 = deep)))
&& (((i_units#43 = N/A) || (i_units#43 = Dozen)) && ((i_size#40 = petite) ||
(i_size#40 = large)))) || (((i_category#37 = Men) && ((i_color#42 = light) ||
(i_color#42 = cornflower))) && (((i_units#43 = Box) || (i_units#43 = Pound)) &&
((i_size#40 = medium) || (i_size#40 = extra large))))))) || ((i_manufact#39 =
i_manufact#17) && (((((i_category#37 = Women) && ((i_color#42 = midnight) ||
(i_color#42 = snow))) && (((i_units#43 = Pallet) || (i_units#43 = Gross)) &&
((i_size#40 = medium) || (i_size#40 = extra large)))) || (((i_category#37 =
Women) && ((i_color#42 = cyan) || (i_color#42 = papaya))) && (((i_units#43 =
Cup) || (i_units#43 = Dram)) && ((i_size#40 = N/A) || (i_size#40 = small)))))
|| ((((i_category#37 = Men) && ((i_color#42 = orange) || (i_color#42 =
frosted))) && (((i_units#43 = Each) || (i_units#43 = Tbl)) && ((i_size#40 =
petite) || (i_size#40 = large)))) || (((i_category#37 = Men) && ((i_color#42 =
forest) || (i_color#42 = ghost))) && (((i_units#43 = Lb) || (i_units#43 =
Bundle)) && ((i_size#40 = medium) || (i_size#40 = extra large))))))))]
: +- Aggregate
[i_manufact#39,i_category#37,i_size#40,i_units#43,i_color#42],
[(count(1),mode=Complete,isDistinct=false) AS
item_cnt#0L,i_manufact#39,i_category#37,i_size#40,i_units#43,i_color#42]
: +- MetastoreRelation hadoopds1g, item, None
+- SubqueryAlias i1
+-
Relation[i_item_sk#3,i_item_id#4,i_rec_start_date#5,i_rec_end_date#6,i_item_desc#7,i_current_price#8,i_wholesale_cost#9,i_brand_id#10L,i_brand#11,i_class_id#12L,i_class#13,i_category_id#14L,i_category#15,i_manufact_id#16L,i_manufact#17,i_size#18,i_formulation#19,i_color#20,i_units#21,i_container#22,i_manager_id#23L,i_product_name#24]
HadoopFiles
{noformat}
Note that the q41 in
https://github.com/sameeragarwal/spark/blob/tpcds-more/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/tpcds/TPCDS_1_4_Benchmark.scala
is NOT the official TPC-DS query nor does it have allowed "minor query
modification". It works in the nightly build. But we cannot claim it is a
TPC-DS query.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]