Hyoungjun Kim created TAJO-1561:
-----------------------------------
Summary: Query which contains join condition in "OR" clause does
not finish.
Key: TAJO-1561
URL: https://issues.apache.org/jira/browse/TAJO-1561
Project: Tajo
Issue Type: Bug
Reporter: Hyoungjun Kim
{code:sql}
select sum (ss_quantity)
from store_sales, store, customer_demographics, customer_address, date_dim
where s_store_sk = ss_store_sk
and ss_sold_date_sk = d_date_sk and d_year = 1998
and
(
(
cd_demo_sk = ss_cdemo_sk
and
cd_marital_status = 'M'
and
cd_education_status = '4 yr Degree'
and
ss_sales_price between 100.00 and 150.00
)
or
(
cd_demo_sk = ss_cdemo_sk
and
cd_marital_status = 'M'
and
cd_education_status = '4 yr Degree'
and
ss_sales_price between 50.00 and 100.00
)
or
(
cd_demo_sk = ss_cdemo_sk
and
cd_marital_status = 'M'
and
cd_education_status = '4 yr Degree'
and
ss_sales_price between 150.00 and 200.00
)
)
and
(
(
ss_addr_sk = ca_address_sk
and
ca_country = 'United States'
and
ca_state in ('KY', 'GA', 'NM')
and ss_net_profit between 0 and 2000
)
or
(ss_addr_sk = ca_address_sk
and
ca_country = 'United States'
and
ca_state in ('MT', 'OR', 'IN')
and ss_net_profit between 150 and 3000
)
or
(ss_addr_sk = ca_address_sk
and
ca_country = 'United States'
and
ca_state in ('WI', 'MO', 'WV')
and ss_net_profit between 50 and 25000
)
)
{code}
See the following query(TPC-DS Query48). The join condition of this query is in
the repeated OR clause as following:
{noformat}
cd_demo_sk = ss_cdemo_sk
and
cd_marital_status = 'M'
and
cd_education_status = '4 yr Degree'
{noformat}
Tajo planner make the logical for this query with CROSS JOIN because the
planner can't find JOIN condition. This query can be changed as following.
{code:sql}
select sum (ss_quantity)
from store_sales, store, customer_demographics, customer_address, date_dim
where s_store_sk = ss_store_sk
and ss_sold_date_sk = d_date_sk and d_year = 1998
and
(cd_demo_sk = ss_cdemo_sk
and
cd_marital_status = 'M'
and
cd_education_status = '4 yr Degree'
and (
(ss_sales_price between 50.00 and 100.00) or
(ss_sales_price between 100.00 and 150.00) or
(ss_sales_price between 150.00 and 200.00)
))
and
(
ss_addr_sk = ca_address_sk
and
ca_country = 'United States'
and (
(ca_state in ('KY', 'GA', 'NM') and ss_net_profit between 0 and 2000)
or
(ca_state in ('MT', 'OR', 'IN') and ss_net_profit between 150 and 3000)
or
(ca_state in ('WI', 'MO', 'WV') and ss_net_profit between 50 and 25000)
)
)
{code}
Other solution also have same problem. See the following issues.
- https://issues.cloudera.org/browse/IMPALA-1707
- https://issues.apache.org/jira/browse/HIVE-7914
This issue is related TPC-DS query 13, 48, 85.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)