Hyoungjun Kim created TAJO-1561:
-----------------------------------

             Summary: Query which contains join condition in "OR" clause does 
not finish.
                 Key: TAJO-1561
                 URL: https://issues.apache.org/jira/browse/TAJO-1561
             Project: Tajo
          Issue Type: Bug
            Reporter: Hyoungjun Kim


{code:sql}
select sum (ss_quantity)
 from store_sales, store, customer_demographics, customer_address, date_dim
 where s_store_sk = ss_store_sk
 and  ss_sold_date_sk = d_date_sk and d_year = 1998
 and  
 (
  (
   cd_demo_sk = ss_cdemo_sk
   and 
   cd_marital_status = 'M'
   and 
   cd_education_status = '4 yr Degree'
   and 
   ss_sales_price between 100.00 and 150.00  
   )
 or
  (
  cd_demo_sk = ss_cdemo_sk
   and 
   cd_marital_status = 'M'
   and 
   cd_education_status = '4 yr Degree'
   and 
   ss_sales_price between 50.00 and 100.00   
  )
 or 
 (
  cd_demo_sk = ss_cdemo_sk
  and 
   cd_marital_status = 'M'
   and 
   cd_education_status = '4 yr Degree'
   and 
   ss_sales_price between 150.00 and 200.00  
 )
 )
 and
 (
  (
  ss_addr_sk = ca_address_sk
  and
  ca_country = 'United States'
  and
  ca_state in ('KY', 'GA', 'NM')
  and ss_net_profit between 0 and 2000  
  )
 or
  (ss_addr_sk = ca_address_sk
  and
  ca_country = 'United States'
  and
  ca_state in ('MT', 'OR', 'IN')
  and ss_net_profit between 150 and 3000 
  )
 or
  (ss_addr_sk = ca_address_sk
  and
  ca_country = 'United States'
  and
  ca_state in ('WI', 'MO', 'WV')
  and ss_net_profit between 50 and 25000 
  )
 )
{code}

See the following query(TPC-DS Query48). The join condition of this query is in 
the repeated OR clause as following:
{noformat}
   cd_demo_sk = ss_cdemo_sk
   and 
   cd_marital_status = 'M'
   and 
   cd_education_status = '4 yr Degree'
{noformat}

Tajo planner make the logical for this query with CROSS JOIN because the 
planner can't find JOIN condition. This query can be changed as following.
{code:sql}
select sum (ss_quantity)
 from store_sales, store, customer_demographics, customer_address, date_dim
 where s_store_sk = ss_store_sk
 and  ss_sold_date_sk = d_date_sk and d_year = 1998
 and
(cd_demo_sk = ss_cdemo_sk
and
cd_marital_status = 'M'
and
cd_education_status = '4 yr Degree'
and (
  (ss_sales_price between 50.00 and 100.00) or
  (ss_sales_price between 100.00 and 150.00) or
  (ss_sales_price between 150.00 and 200.00)
))
and
(
ss_addr_sk = ca_address_sk
  and
  ca_country = 'United States'
  and (
   (ca_state in ('KY', 'GA', 'NM') and ss_net_profit between 0 and 2000)
   or
   (ca_state in ('MT', 'OR', 'IN') and ss_net_profit between 150 and 3000)
   or
   (ca_state in ('WI', 'MO', 'WV') and ss_net_profit between 50 and 25000)
  )
)
{code}

Other solution also have same problem. See the following issues.
- https://issues.cloudera.org/browse/IMPALA-1707 
- https://issues.apache.org/jira/browse/HIVE-7914

This issue is related TPC-DS query 13, 48, 85.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to