[
https://issues.apache.org/jira/browse/TAJO-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyoungjun Kim updated TAJO-1561:
--------------------------------
Description:
{code:sql}
select sum (ss_quantity)
from store_sales, store, customer_demographics, customer_address, date_dim
where s_store_sk = ss_store_sk
and ss_sold_date_sk = d_date_sk and d_year = 1998
and
(
(
cd_demo_sk = ss_cdemo_sk
and
cd_marital_status = 'M'
and
cd_education_status = '4 yr Degree'
and
ss_sales_price between 100.00 and 150.00
)
or
(
cd_demo_sk = ss_cdemo_sk
and
cd_marital_status = 'M'
and
cd_education_status = '4 yr Degree'
and
ss_sales_price between 50.00 and 100.00
)
or
(
cd_demo_sk = ss_cdemo_sk
and
cd_marital_status = 'M'
and
cd_education_status = '4 yr Degree'
and
ss_sales_price between 150.00 and 200.00
)
)
and
(
(
ss_addr_sk = ca_address_sk
and
ca_country = 'United States'
and
ca_state in ('KY', 'GA', 'NM')
and ss_net_profit between 0 and 2000
)
or
(ss_addr_sk = ca_address_sk
and
ca_country = 'United States'
and
ca_state in ('MT', 'OR', 'IN')
and ss_net_profit between 150 and 3000
)
or
(ss_addr_sk = ca_address_sk
and
ca_country = 'United States'
and
ca_state in ('WI', 'MO', 'WV')
and ss_net_profit between 50 and 25000
)
)
{code}
See the following query(TPC-DS Query48). The join condition of this query is in
the repeated OR clause as following:
{noformat}
cd_demo_sk = ss_cdemo_sk
and
cd_marital_status = 'M'
and
cd_education_status = '4 yr Degree'
{noformat}
Tajo planner makes the logical for this query with CROSS JOIN because the
planner can't find JOIN condition. So this query does not finish. This query
can be changed to the following.
{code:sql}
select sum (ss_quantity)
from store_sales, store, customer_demographics, customer_address, date_dim
where s_store_sk = ss_store_sk
and ss_sold_date_sk = d_date_sk and d_year = 1998
and
(cd_demo_sk = ss_cdemo_sk
and
cd_marital_status = 'M'
and
cd_education_status = '4 yr Degree'
and (
(ss_sales_price between 50.00 and 100.00) or
(ss_sales_price between 100.00 and 150.00) or
(ss_sales_price between 150.00 and 200.00)
))
and
(
ss_addr_sk = ca_address_sk
and
ca_country = 'United States'
and (
(ca_state in ('KY', 'GA', 'NM') and ss_net_profit between 0 and 2000)
or
(ca_state in ('MT', 'OR', 'IN') and ss_net_profit between 150 and 3000)
or
(ca_state in ('WI', 'MO', 'WV') and ss_net_profit between 50 and 25000)
)
)
{code}
Other solution also have same problem. See the following issues.
- https://issues.cloudera.org/browse/IMPALA-1707
- https://issues.apache.org/jira/browse/HIVE-7914
This issue is related with TPC-DS query 13, 48, 85.
was:
{code:sql}
select sum (ss_quantity)
from store_sales, store, customer_demographics, customer_address, date_dim
where s_store_sk = ss_store_sk
and ss_sold_date_sk = d_date_sk and d_year = 1998
and
(
(
cd_demo_sk = ss_cdemo_sk
and
cd_marital_status = 'M'
and
cd_education_status = '4 yr Degree'
and
ss_sales_price between 100.00 and 150.00
)
or
(
cd_demo_sk = ss_cdemo_sk
and
cd_marital_status = 'M'
and
cd_education_status = '4 yr Degree'
and
ss_sales_price between 50.00 and 100.00
)
or
(
cd_demo_sk = ss_cdemo_sk
and
cd_marital_status = 'M'
and
cd_education_status = '4 yr Degree'
and
ss_sales_price between 150.00 and 200.00
)
)
and
(
(
ss_addr_sk = ca_address_sk
and
ca_country = 'United States'
and
ca_state in ('KY', 'GA', 'NM')
and ss_net_profit between 0 and 2000
)
or
(ss_addr_sk = ca_address_sk
and
ca_country = 'United States'
and
ca_state in ('MT', 'OR', 'IN')
and ss_net_profit between 150 and 3000
)
or
(ss_addr_sk = ca_address_sk
and
ca_country = 'United States'
and
ca_state in ('WI', 'MO', 'WV')
and ss_net_profit between 50 and 25000
)
)
{code}
See the following query(TPC-DS Query48). The join condition of this query is in
the repeated OR clause as following:
{noformat}
cd_demo_sk = ss_cdemo_sk
and
cd_marital_status = 'M'
and
cd_education_status = '4 yr Degree'
{noformat}
Tajo planner make the logical for this query with CROSS JOIN because the
planner can't find JOIN condition. This query can be changed as following.
{code:sql}
select sum (ss_quantity)
from store_sales, store, customer_demographics, customer_address, date_dim
where s_store_sk = ss_store_sk
and ss_sold_date_sk = d_date_sk and d_year = 1998
and
(cd_demo_sk = ss_cdemo_sk
and
cd_marital_status = 'M'
and
cd_education_status = '4 yr Degree'
and (
(ss_sales_price between 50.00 and 100.00) or
(ss_sales_price between 100.00 and 150.00) or
(ss_sales_price between 150.00 and 200.00)
))
and
(
ss_addr_sk = ca_address_sk
and
ca_country = 'United States'
and (
(ca_state in ('KY', 'GA', 'NM') and ss_net_profit between 0 and 2000)
or
(ca_state in ('MT', 'OR', 'IN') and ss_net_profit between 150 and 3000)
or
(ca_state in ('WI', 'MO', 'WV') and ss_net_profit between 50 and 25000)
)
)
{code}
Other solution also have same problem. See the following issues.
- https://issues.cloudera.org/browse/IMPALA-1707
- https://issues.apache.org/jira/browse/HIVE-7914
This issue is related TPC-DS query 13, 48, 85.
> Query which contains join condition in "OR" clause does not finish.
> -------------------------------------------------------------------
>
> Key: TAJO-1561
> URL: https://issues.apache.org/jira/browse/TAJO-1561
> Project: Tajo
> Issue Type: Bug
> Reporter: Hyoungjun Kim
>
> {code:sql}
> select sum (ss_quantity)
> from store_sales, store, customer_demographics, customer_address, date_dim
> where s_store_sk = ss_store_sk
> and ss_sold_date_sk = d_date_sk and d_year = 1998
> and
> (
> (
> cd_demo_sk = ss_cdemo_sk
> and
> cd_marital_status = 'M'
> and
> cd_education_status = '4 yr Degree'
> and
> ss_sales_price between 100.00 and 150.00
> )
> or
> (
> cd_demo_sk = ss_cdemo_sk
> and
> cd_marital_status = 'M'
> and
> cd_education_status = '4 yr Degree'
> and
> ss_sales_price between 50.00 and 100.00
> )
> or
> (
> cd_demo_sk = ss_cdemo_sk
> and
> cd_marital_status = 'M'
> and
> cd_education_status = '4 yr Degree'
> and
> ss_sales_price between 150.00 and 200.00
> )
> )
> and
> (
> (
> ss_addr_sk = ca_address_sk
> and
> ca_country = 'United States'
> and
> ca_state in ('KY', 'GA', 'NM')
> and ss_net_profit between 0 and 2000
> )
> or
> (ss_addr_sk = ca_address_sk
> and
> ca_country = 'United States'
> and
> ca_state in ('MT', 'OR', 'IN')
> and ss_net_profit between 150 and 3000
> )
> or
> (ss_addr_sk = ca_address_sk
> and
> ca_country = 'United States'
> and
> ca_state in ('WI', 'MO', 'WV')
> and ss_net_profit between 50 and 25000
> )
> )
> {code}
> See the following query(TPC-DS Query48). The join condition of this query is
> in the repeated OR clause as following:
> {noformat}
> cd_demo_sk = ss_cdemo_sk
> and
> cd_marital_status = 'M'
> and
> cd_education_status = '4 yr Degree'
> {noformat}
> Tajo planner makes the logical for this query with CROSS JOIN because the
> planner can't find JOIN condition. So this query does not finish. This query
> can be changed to the following.
> {code:sql}
> select sum (ss_quantity)
> from store_sales, store, customer_demographics, customer_address, date_dim
> where s_store_sk = ss_store_sk
> and ss_sold_date_sk = d_date_sk and d_year = 1998
> and
> (cd_demo_sk = ss_cdemo_sk
> and
> cd_marital_status = 'M'
> and
> cd_education_status = '4 yr Degree'
> and (
> (ss_sales_price between 50.00 and 100.00) or
> (ss_sales_price between 100.00 and 150.00) or
> (ss_sales_price between 150.00 and 200.00)
> ))
> and
> (
> ss_addr_sk = ca_address_sk
> and
> ca_country = 'United States'
> and (
> (ca_state in ('KY', 'GA', 'NM') and ss_net_profit between 0 and 2000)
> or
> (ca_state in ('MT', 'OR', 'IN') and ss_net_profit between 150 and 3000)
> or
> (ca_state in ('WI', 'MO', 'WV') and ss_net_profit between 50 and 25000)
> )
> )
> {code}
> Other solution also have same problem. See the following issues.
> - https://issues.cloudera.org/browse/IMPALA-1707
> - https://issues.apache.org/jira/browse/HIVE-7914
> This issue is related with TPC-DS query 13, 48, 85.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)