JESSE CHEN created SPARK-15372:
----------------------------------

             Summary: TPC-DS Qury 84 returns wrong results against TPC official
                 Key: SPARK-15372
                 URL: https://issues.apache.org/jira/browse/SPARK-15372
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.0.0
            Reporter: JESSE CHEN
            Assignee: Herman van Hovell
            Priority: Critical
             Fix For: 2.0.0


The official TPC-DS query 41 fails with the following error:

{noformat}
Error in query: The correlated scalar subquery can only contain equality 
predicates: (((i_manufact#38 = i_manufact#16) && (((((i_category#36 = Women) && 
((i_color#41 = powder) || (i_color#41 = khaki))) && (((i_units#42 = Ounce) || 
(i_units#42 = Oz)) && ((i_size#39 = medium) || (i_size#39 = extra large)))) || 
(((i_category#36 = Women) && ((i_color#41 = brown) || (i_color#41 = honeydew))) 
&& (((i_units#42 = Bunch) || (i_units#42 = Ton)) && ((i_size#39 = N/A) || 
(i_size#39 = small))))) || ((((i_category#36 = Men) && ((i_color#41 = floral) 
|| (i_color#41 = deep))) && (((i_units#42 = N/A) || (i_units#42 = Dozen)) && 
((i_size#39 = petite) || (i_size#39 = large)))) || (((i_category#36 = Men) && 
((i_color#41 = light) || (i_color#41 = cornflower))) && (((i_units#42 = Box) || 
(i_units#42 = Pound)) && ((i_size#39 = medium) || (i_size#39 = extra 
large))))))) || ((i_manufact#38 = i_manufact#16) && (((((i_category#36 = Women) 
&& ((i_color#41 = midnight) || (i_color#41 = snow))) && (((i_units#42 = Pallet) 
|| (i_units#42 = Gross)) && ((i_size#39 = medium) || (i_size#39 = extra 
large)))) || (((i_category#36 = Women) && ((i_color#41 = cyan) || (i_color#41 = 
papaya))) && (((i_units#42 = Cup) || (i_units#42 = Dram)) && ((i_size#39 = N/A) 
|| (i_size#39 = small))))) || ((((i_category#36 = Men) && ((i_color#41 = 
orange) || (i_color#41 = frosted))) && (((i_units#42 = Each) || (i_units#42 = 
Tbl)) && ((i_size#39 = petite) || (i_size#39 = large)))) || (((i_category#36 = 
Men) && ((i_color#41 = forest) || (i_color#41 = ghost))) && (((i_units#42 = Lb) 
|| (i_units#42 = Bundle)) && ((i_size#39 = medium) || (i_size#39 = extra 
large))))))));
{noformat}

The output plans showed the following errors
{noformat}
== Parsed Logical Plan ==
'GlobalLimit 100
+- 'LocalLimit 100
   +- 'Sort ['i_product_name ASC], true
      +- 'Distinct
         +- 'Project ['i_product_name]
            +- 'Filter ((('i_manufact_id >= 738) && ('i_manufact_id <= (738 + 
40))) && (scalar-subquery#1 [] > 0))
               :  +- 'SubqueryAlias scalar-subquery#1 []
               :     +- 'Project ['count(1) AS item_cnt#0]
               :        +- 'Filter ((('i_manufact = 'i1.i_manufact) && 
((((('i_category = Women) && (('i_color = powder) || ('i_color = khaki))) && 
((('i_units = Ounce) || ('i_units = Oz)) && (('i_size = medium) || ('i_size = 
extra large)))) || ((('i_category = Women) && (('i_color = brown) || ('i_color 
= honeydew))) && ((('i_units = Bunch) || ('i_units = Ton)) && (('i_size = N/A) 
|| ('i_size = small))))) || (((('i_category = Men) && (('i_color = floral) || 
('i_color = deep))) && ((('i_units = N/A) || ('i_units = Dozen)) && (('i_size = 
petite) || ('i_size = large)))) || ((('i_category = Men) && (('i_color = light) 
|| ('i_color = cornflower))) && ((('i_units = Box) || ('i_units = Pound)) && 
(('i_size = medium) || ('i_size = extra large))))))) || (('i_manufact = 
'i1.i_manufact) && ((((('i_category = Women) && (('i_color = midnight) || 
('i_color = snow))) && ((('i_units = Pallet) || ('i_units = Gross)) && 
(('i_size = medium) || ('i_size = extra large)))) || ((('i_category = Women) && 
(('i_color = cyan) || ('i_color = papaya))) && ((('i_units = Cup) || ('i_units 
= Dram)) && (('i_size = N/A) || ('i_size = small))))) || (((('i_category = Men) 
&& (('i_color = orange) || ('i_color = frosted))) && ((('i_units = Each) || 
('i_units = Tbl)) && (('i_size = petite) || ('i_size = large)))) || 
((('i_category = Men) && (('i_color = forest) || ('i_color = ghost))) && 
((('i_units = Lb) || ('i_units = Bundle)) && (('i_size = medium) || ('i_size = 
extra large))))))))
               :           +- 'UnresolvedRelation `item`, None
               +- 'UnresolvedRelation `item`, Some(i1)

== Analyzed Logical Plan ==
i_product_name: string
GlobalLimit 100
+- LocalLimit 100
   +- Sort [i_product_name#24 ASC], true
      +- Distinct
         +- Project [i_product_name#24]
            +- Filter (((i_manufact_id#16L >= cast(738 as bigint)) && 
(i_manufact_id#16L <= cast((738 + 40) as bigint))) && (scalar-subquery#1 
[(((i_manufact#39 = i_manufact#17) && (((((i_category#37 = Women) && 
((i_color#42 = powder) || (i_color#42 = khaki))) && (((i_units#43 = Ounce) || 
(i_units#43 = Oz)) && ((i_size#40 = medium) || (i_size#40 = extra large)))) || 
(((i_category#37 = Women) && ((i_color#42 = brown) || (i_color#42 = honeydew))) 
&& (((i_units#43 = Bunch) || (i_units#43 = Ton)) && ((i_size#40 = N/A) || 
(i_size#40 = small))))) || ((((i_category#37 = Men) && ((i_color#42 = floral) 
|| (i_color#42 = deep))) && (((i_units#43 = N/A) || (i_units#43 = Dozen)) && 
((i_size#40 = petite) || (i_size#40 = large)))) || (((i_category#37 = Men) && 
((i_color#42 = light) || (i_color#42 = cornflower))) && (((i_units#43 = Box) || 
(i_units#43 = Pound)) && ((i_size#40 = medium) || (i_size#40 = extra 
large))))))) || ((i_manufact#39 = i_manufact#17) && (((((i_category#37 = Women) 
&& ((i_color#42 = midnight) || (i_color#42 = snow))) && (((i_units#43 = Pallet) 
|| (i_units#43 = Gross)) && ((i_size#40 = medium) || (i_size#40 = extra 
large)))) || (((i_category#37 = Women) && ((i_color#42 = cyan) || (i_color#42 = 
papaya))) && (((i_units#43 = Cup) || (i_units#43 = Dram)) && ((i_size#40 = N/A) 
|| (i_size#40 = small))))) || ((((i_category#37 = Men) && ((i_color#42 = 
orange) || (i_color#42 = frosted))) && (((i_units#43 = Each) || (i_units#43 = 
Tbl)) && ((i_size#40 = petite) || (i_size#40 = large)))) || (((i_category#37 = 
Men) && ((i_color#42 = forest) || (i_color#42 = ghost))) && (((i_units#43 = Lb) 
|| (i_units#43 = Bundle)) && ((i_size#40 = medium) || (i_size#40 = extra 
large))))))))] > cast(0 as bigint)))
               :  +- SubqueryAlias scalar-subquery#1 [(((i_manufact#39 = 
i_manufact#17) && (((((i_category#37 = Women) && ((i_color#42 = powder) || 
(i_color#42 = khaki))) && (((i_units#43 = Ounce) || (i_units#43 = Oz)) && 
((i_size#40 = medium) || (i_size#40 = extra large)))) || (((i_category#37 = 
Women) && ((i_color#42 = brown) || (i_color#42 = honeydew))) && (((i_units#43 = 
Bunch) || (i_units#43 = Ton)) && ((i_size#40 = N/A) || (i_size#40 = small))))) 
|| ((((i_category#37 = Men) && ((i_color#42 = floral) || (i_color#42 = deep))) 
&& (((i_units#43 = N/A) || (i_units#43 = Dozen)) && ((i_size#40 = petite) || 
(i_size#40 = large)))) || (((i_category#37 = Men) && ((i_color#42 = light) || 
(i_color#42 = cornflower))) && (((i_units#43 = Box) || (i_units#43 = Pound)) && 
((i_size#40 = medium) || (i_size#40 = extra large))))))) || ((i_manufact#39 = 
i_manufact#17) && (((((i_category#37 = Women) && ((i_color#42 = midnight) || 
(i_color#42 = snow))) && (((i_units#43 = Pallet) || (i_units#43 = Gross)) && 
((i_size#40 = medium) || (i_size#40 = extra large)))) || (((i_category#37 = 
Women) && ((i_color#42 = cyan) || (i_color#42 = papaya))) && (((i_units#43 = 
Cup) || (i_units#43 = Dram)) && ((i_size#40 = N/A) || (i_size#40 = small))))) 
|| ((((i_category#37 = Men) && ((i_color#42 = orange) || (i_color#42 = 
frosted))) && (((i_units#43 = Each) || (i_units#43 = Tbl)) && ((i_size#40 = 
petite) || (i_size#40 = large)))) || (((i_category#37 = Men) && ((i_color#42 = 
forest) || (i_color#42 = ghost))) && (((i_units#43 = Lb) || (i_units#43 = 
Bundle)) && ((i_size#40 = medium) || (i_size#40 = extra large))))))))]
               :     +- Aggregate 
[i_manufact#39,i_category#37,i_size#40,i_units#43,i_color#42], 
[(count(1),mode=Complete,isDistinct=false) AS 
item_cnt#0L,i_manufact#39,i_category#37,i_size#40,i_units#43,i_color#42]
               :        +- MetastoreRelation hadoopds1g, item, None
               +- SubqueryAlias i1
                  +- 
Relation[i_item_sk#3,i_item_id#4,i_rec_start_date#5,i_rec_end_date#6,i_item_desc#7,i_current_price#8,i_wholesale_cost#9,i_brand_id#10L,i_brand#11,i_class_id#12L,i_class#13,i_category_id#14L,i_category#15,i_manufact_id#16L,i_manufact#17,i_size#18,i_formulation#19,i_color#20,i_units#21,i_container#22,i_manager_id#23L,i_product_name#24]
 HadoopFiles

{noformat}

Note that the q41 in  
https://github.com/sameeragarwal/spark/blob/tpcds-more/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/tpcds/TPCDS_1_4_Benchmark.scala
 is NOT the official TPC-DS query nor does it have allowed "minor query 
modification". It works in the nightly build. But we cannot claim it is a 
TPC-DS query. 






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to