[ 
https://issues.apache.org/jira/browse/TAJO-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyoungjun Kim updated TAJO-1561:
--------------------------------
    Description: 
{code:sql}
select sum (ss_quantity)
 from store_sales, store, customer_demographics, customer_address, date_dim
 where s_store_sk = ss_store_sk
 and  ss_sold_date_sk = d_date_sk and d_year = 1998
 and  
 (
  (
   cd_demo_sk = ss_cdemo_sk
   and 
   cd_marital_status = 'M'
   and 
   cd_education_status = '4 yr Degree'
   and 
   ss_sales_price between 100.00 and 150.00  
   )
 or
  (
  cd_demo_sk = ss_cdemo_sk
   and 
   cd_marital_status = 'M'
   and 
   cd_education_status = '4 yr Degree'
   and 
   ss_sales_price between 50.00 and 100.00   
  )
 or 
 (
  cd_demo_sk = ss_cdemo_sk
  and 
   cd_marital_status = 'M'
   and 
   cd_education_status = '4 yr Degree'
   and 
   ss_sales_price between 150.00 and 200.00  
 )
 )
 and
 (
  (
  ss_addr_sk = ca_address_sk
  and
  ca_country = 'United States'
  and
  ca_state in ('KY', 'GA', 'NM')
  and ss_net_profit between 0 and 2000  
  )
 or
  (ss_addr_sk = ca_address_sk
  and
  ca_country = 'United States'
  and
  ca_state in ('MT', 'OR', 'IN')
  and ss_net_profit between 150 and 3000 
  )
 or
  (ss_addr_sk = ca_address_sk
  and
  ca_country = 'United States'
  and
  ca_state in ('WI', 'MO', 'WV')
  and ss_net_profit between 50 and 25000 
  )
 )
{code}

See the following query(TPC-DS Query48). The join condition of this query is in 
the repeated OR clause as following:
{noformat}
   cd_demo_sk = ss_cdemo_sk
   and 
   cd_marital_status = 'M'
   and 
   cd_education_status = '4 yr Degree'
{noformat}

Tajo planner makes the logical for this query with CROSS JOIN because the 
planner can't find JOIN condition. So this query does not finish. This query 
can be changed to the following.
{code:sql}
select sum (ss_quantity)
 from store_sales, store, customer_demographics, customer_address, date_dim
 where s_store_sk = ss_store_sk
 and  ss_sold_date_sk = d_date_sk and d_year = 1998
 and
(cd_demo_sk = ss_cdemo_sk
and
cd_marital_status = 'M'
and
cd_education_status = '4 yr Degree'
and (
  (ss_sales_price between 50.00 and 100.00) or
  (ss_sales_price between 100.00 and 150.00) or
  (ss_sales_price between 150.00 and 200.00)
))
and
(
ss_addr_sk = ca_address_sk
  and
  ca_country = 'United States'
  and (
   (ca_state in ('KY', 'GA', 'NM') and ss_net_profit between 0 and 2000)
   or
   (ca_state in ('MT', 'OR', 'IN') and ss_net_profit between 150 and 3000)
   or
   (ca_state in ('WI', 'MO', 'WV') and ss_net_profit between 50 and 25000)
  )
)
{code}

Other solution also have same problem. See the following issues.
- https://issues.cloudera.org/browse/IMPALA-1707 
- https://issues.apache.org/jira/browse/HIVE-7914

This issue is related with TPC-DS query 13, 48, 85.

  was:
{code:sql}
select sum (ss_quantity)
 from store_sales, store, customer_demographics, customer_address, date_dim
 where s_store_sk = ss_store_sk
 and  ss_sold_date_sk = d_date_sk and d_year = 1998
 and  
 (
  (
   cd_demo_sk = ss_cdemo_sk
   and 
   cd_marital_status = 'M'
   and 
   cd_education_status = '4 yr Degree'
   and 
   ss_sales_price between 100.00 and 150.00  
   )
 or
  (
  cd_demo_sk = ss_cdemo_sk
   and 
   cd_marital_status = 'M'
   and 
   cd_education_status = '4 yr Degree'
   and 
   ss_sales_price between 50.00 and 100.00   
  )
 or 
 (
  cd_demo_sk = ss_cdemo_sk
  and 
   cd_marital_status = 'M'
   and 
   cd_education_status = '4 yr Degree'
   and 
   ss_sales_price between 150.00 and 200.00  
 )
 )
 and
 (
  (
  ss_addr_sk = ca_address_sk
  and
  ca_country = 'United States'
  and
  ca_state in ('KY', 'GA', 'NM')
  and ss_net_profit between 0 and 2000  
  )
 or
  (ss_addr_sk = ca_address_sk
  and
  ca_country = 'United States'
  and
  ca_state in ('MT', 'OR', 'IN')
  and ss_net_profit between 150 and 3000 
  )
 or
  (ss_addr_sk = ca_address_sk
  and
  ca_country = 'United States'
  and
  ca_state in ('WI', 'MO', 'WV')
  and ss_net_profit between 50 and 25000 
  )
 )
{code}

See the following query(TPC-DS Query48). The join condition of this query is in 
the repeated OR clause as following:
{noformat}
   cd_demo_sk = ss_cdemo_sk
   and 
   cd_marital_status = 'M'
   and 
   cd_education_status = '4 yr Degree'
{noformat}

Tajo planner make the logical for this query with CROSS JOIN because the 
planner can't find JOIN condition. This query can be changed as following.
{code:sql}
select sum (ss_quantity)
 from store_sales, store, customer_demographics, customer_address, date_dim
 where s_store_sk = ss_store_sk
 and  ss_sold_date_sk = d_date_sk and d_year = 1998
 and
(cd_demo_sk = ss_cdemo_sk
and
cd_marital_status = 'M'
and
cd_education_status = '4 yr Degree'
and (
  (ss_sales_price between 50.00 and 100.00) or
  (ss_sales_price between 100.00 and 150.00) or
  (ss_sales_price between 150.00 and 200.00)
))
and
(
ss_addr_sk = ca_address_sk
  and
  ca_country = 'United States'
  and (
   (ca_state in ('KY', 'GA', 'NM') and ss_net_profit between 0 and 2000)
   or
   (ca_state in ('MT', 'OR', 'IN') and ss_net_profit between 150 and 3000)
   or
   (ca_state in ('WI', 'MO', 'WV') and ss_net_profit between 50 and 25000)
  )
)
{code}

Other solution also have same problem. See the following issues.
- https://issues.cloudera.org/browse/IMPALA-1707 
- https://issues.apache.org/jira/browse/HIVE-7914

This issue is related TPC-DS query 13, 48, 85.


> Query which contains join condition in "OR" clause does not finish.
> -------------------------------------------------------------------
>
>                 Key: TAJO-1561
>                 URL: https://issues.apache.org/jira/browse/TAJO-1561
>             Project: Tajo
>          Issue Type: Bug
>            Reporter: Hyoungjun Kim
>
> {code:sql}
> select sum (ss_quantity)
>  from store_sales, store, customer_demographics, customer_address, date_dim
>  where s_store_sk = ss_store_sk
>  and  ss_sold_date_sk = d_date_sk and d_year = 1998
>  and  
>  (
>   (
>    cd_demo_sk = ss_cdemo_sk
>    and 
>    cd_marital_status = 'M'
>    and 
>    cd_education_status = '4 yr Degree'
>    and 
>    ss_sales_price between 100.00 and 150.00  
>    )
>  or
>   (
>   cd_demo_sk = ss_cdemo_sk
>    and 
>    cd_marital_status = 'M'
>    and 
>    cd_education_status = '4 yr Degree'
>    and 
>    ss_sales_price between 50.00 and 100.00   
>   )
>  or 
>  (
>   cd_demo_sk = ss_cdemo_sk
>   and 
>    cd_marital_status = 'M'
>    and 
>    cd_education_status = '4 yr Degree'
>    and 
>    ss_sales_price between 150.00 and 200.00  
>  )
>  )
>  and
>  (
>   (
>   ss_addr_sk = ca_address_sk
>   and
>   ca_country = 'United States'
>   and
>   ca_state in ('KY', 'GA', 'NM')
>   and ss_net_profit between 0 and 2000  
>   )
>  or
>   (ss_addr_sk = ca_address_sk
>   and
>   ca_country = 'United States'
>   and
>   ca_state in ('MT', 'OR', 'IN')
>   and ss_net_profit between 150 and 3000 
>   )
>  or
>   (ss_addr_sk = ca_address_sk
>   and
>   ca_country = 'United States'
>   and
>   ca_state in ('WI', 'MO', 'WV')
>   and ss_net_profit between 50 and 25000 
>   )
>  )
> {code}
> See the following query(TPC-DS Query48). The join condition of this query is 
> in the repeated OR clause as following:
> {noformat}
>    cd_demo_sk = ss_cdemo_sk
>    and 
>    cd_marital_status = 'M'
>    and 
>    cd_education_status = '4 yr Degree'
> {noformat}
> Tajo planner makes the logical for this query with CROSS JOIN because the 
> planner can't find JOIN condition. So this query does not finish. This query 
> can be changed to the following.
> {code:sql}
> select sum (ss_quantity)
>  from store_sales, store, customer_demographics, customer_address, date_dim
>  where s_store_sk = ss_store_sk
>  and  ss_sold_date_sk = d_date_sk and d_year = 1998
>  and
> (cd_demo_sk = ss_cdemo_sk
> and
> cd_marital_status = 'M'
> and
> cd_education_status = '4 yr Degree'
> and (
>   (ss_sales_price between 50.00 and 100.00) or
>   (ss_sales_price between 100.00 and 150.00) or
>   (ss_sales_price between 150.00 and 200.00)
> ))
> and
> (
> ss_addr_sk = ca_address_sk
>   and
>   ca_country = 'United States'
>   and (
>    (ca_state in ('KY', 'GA', 'NM') and ss_net_profit between 0 and 2000)
>    or
>    (ca_state in ('MT', 'OR', 'IN') and ss_net_profit between 150 and 3000)
>    or
>    (ca_state in ('WI', 'MO', 'WV') and ss_net_profit between 50 and 25000)
>   )
> )
> {code}
> Other solution also have same problem. See the following issues.
> - https://issues.cloudera.org/browse/IMPALA-1707 
> - https://issues.apache.org/jira/browse/HIVE-7914
> This issue is related with TPC-DS query 13, 48, 85.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to