Jaehwa Jung created TAJO-750:
--------------------------------
Summary: Join orders affects abnormal to the result data.
Key: TAJO-750
URL: https://issues.apache.org/jira/browse/TAJO-750
Project: Tajo
Issue Type: Sub-task
Reporter: Jaehwa Jung
I found that join orders affects abnormal to the result data as follows:
*Environment*
* DataSet: TPC-DS
*Case: 1*
{code:xml}
SELECT COUNT(*)
FROM (SELECT cs.cs_item_sk as cs_item_sk,
cs.cs_ext_discount_amt as cs_ext_discount_amt
FROM catalog_sales cs
JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
WHERE d.d_date between '2000-01-27' and '2000-04-27') cs1
JOIN item i ON (i.i_item_sk = cs1.cs_item_sk)
JOIN (SELECT cs2.cs_item_sk as cs_item_sk,
1.3 * avg(cs_ext_discount_amt) as
avg_cs_ext_discount_amt
FROM (SELECT cs.cs_item_sk as cs_item_sk,
cs.cs_ext_discount_amt as
cs_ext_discount_amt
FROM catalog_sales cs
JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
WHERE d.d_date between '2000-01-27' and '2000-04-27')
cs2
GROUP BY cs2.cs_item_sk) tmp1
ON (i.i_item_sk = tmp1.cs_item_sk)
{code}
- expected result: 71147
- activated result: 4163848
* Case 2
{code:xml}
SELECT COUNT(*)
FROM item i
JOIN (SELECT cs.cs_item_sk as cs_item_sk,
cs.cs_ext_discount_amt as cs_ext_discount_amt
FROM catalog_sales cs
JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
WHERE d.d_date between '2000-01-27' and '2000-04-27') cs1 ON
(i.i_item_sk = cs1.cs_item_sk)
JOIN (SELECT cs2.cs_item_sk as cs_item_sk,
1.3 * avg(cs_ext_discount_amt) as
avg_cs_ext_discount_amt
FROM (SELECT cs.cs_item_sk as cs_item_sk,
cs.cs_ext_discount_amt as
cs_ext_discount_amt
FROM catalog_sales cs
JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
WHERE d.d_date between '2000-01-27' and '2000-04-27')
cs2
GROUP BY cs2.cs_item_sk) tmp1
ON (i.i_item_sk = tmp1.cs_item_sk);
{code}
- expected result: 23890
- activated result: 4163848
As you knew, two queries doesn't show the expected result. Furthermore, each
result have differences.
--
This message was sent by Atlassian JIRA
(v6.2#6252)