[
https://issues.apache.org/jira/browse/TAJO-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jaehwa Jung updated TAJO-741:
-----------------------------
Attachment: TAJO-741_2.patch
I updated the review request against branch master in reviewboard.
https://reviews.apache.org/r/20125/
I'm very glad to upload second patch because that this issue is tricky, too.
This issue has various bugs as follows:
* JoinOptimization removes two more join conditions.
* JoinOptimization removes a join condition using constant.
* HashJoin can't find theta join conditions
For reference, if we test on TPC-DS dataset, we can find above bugs as follows:
*Case1*
{code:xml}
SELECT COUNT(*)
FROM (
SELECT cs.cs_item_sk as cs_item_sk,
cs.cs_ext_discount_amt as cs_ext_discount_amt
FROM catalog_sales cs
JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
WHERE d.d_date between '2000-01-27' and '2000-04-27'
) cs1
JOIN item i ON (i.i_item_sk = cs1.cs_item_sk)
WHERE cs1.cs_item_sk = 20864 ;
{code}
* Expected result: 40
* Actual result: 4163848
*Case2*
{code:xml}
SELECT COUNT(*)
FROM (
SELECT cs.cs_item_sk as cs_item_sk,
cs.cs_ext_discount_amt as cs_ext_discount_amt
FROM catalog_sales cs
JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
WHERE d.d_date between '2000-01-27' and '2000-04-27'
) cs1
JOIN item i ON (i.i_item_sk = cs1.cs_item_sk)
WHERE cs1.cs_item_sk = 20864
and i.i_manufact_id = 436;
{code}
* Expected result: 40
* Actual result: 4586
> GreedyHeuristicJoinOrderAlgorithm removes some join pairs.
> ----------------------------------------------------------
>
> Key: TAJO-741
> URL: https://issues.apache.org/jira/browse/TAJO-741
> Project: Tajo
> Issue Type: Sub-task
> Components: distributed query plan
> Reporter: Jaehwa Jung
> Assignee: Jaehwa Jung
> Attachments: TAJO-741.patch, TAJO-741_2.patch
>
>
> I found a bug for GreedyHeuristicJoinOrderAlgorithm as follows:
> *1. Table Schema*
> {code:xml}
> create external table table1 (id int, name text, score float, type text)
> using csv with ('csvfile.delimiter'='|') location
> 'hdfs://server01:9010/tajo/warehouse/table1' ;
> create external table table3 (id int, name text, score float, type text)
> using csv with ('csvfile.delimiter'='|') location
> 'hdfs://localhost:9010/tajo/warehouse/table3' ;
> create external table table4 (id int, name text, score float, type text)
> using csv with ('csvfile.delimiter'='|') location
> 'hdfs://localhost:9010/tajo/warehouse/table4' ;
> {code}
> *2. Table Data*
> {code:xml}
> 2.1 table1
> 1|name1-1|1.1|a
> 2|name1-2|2.3|b
> 3|name1-3|3.4|c
> 4|name1-4|4.5|d
> 5|name1-5|5.6|e
> 2.2 table3
> 1|name3-1|0.1|a
> 2|name3-2|0.2|b
> 3|name3-3|0.3|b
> 2.3 table4
> 1|name4-1|22.3|a
> 2|name4-2|23.4|b
> 3|name4-3|24.5|cc
> 5|name4-4|25.6|ee
> 6|name4-5|31.1|ff
> 7|name4-6|32.3|gg
> {code}
> *3. Query*
> {code:xml}
> select a.name, c.name, a.type, c.type from table1 a join table3 b on (a.id =
> b.id) join table4 c on (b.id = c.id) where a.type = c.type;
> {code}
> *4. Expected Result*
> {code:xml}
> name, name, type, type
> -------------------------------
> name1-1, name4-1, a, a
> name1-2, name4-2, b, b
> {code}
> *5. Actual Result*
> {code:xml}
> name, name, type, type
> -------------------------------
> name1-1, name4-1, a, a
> name1-2, name4-2, b, b
> name1-3, name4-3, c, cc
> {code}
> I found that _default.a.type (TEXT) = default.c.type (TEXT)_ is in initiative
> plan result and FPD result.
> But after executing GreedyHeuristicJoinOrderAlgorithm, it disappeared in
> logical plan.
--
This message was sent by Atlassian JIRA
(v6.2#6252)