[
https://issues.apache.org/jira/browse/TAJO-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584744#comment-14584744
]
ASF GitHub Bot commented on TAJO-1352:
--------------------------------------
Github user jihoonson commented on the pull request:
https://github.com/apache/tajo/pull/593#issuecomment-111736855
Here is the simple evaluation result.
### Query
```
select
count(*)
from
lineitem left outer join orders
on l_orderkey = o_orderkey
left outer join partsupp
on ps_suppkey = o_custkey
left outer join customer
on ps_suppkey = c_custkey
left outer join part
on p_partkey = c_nationkey
```
### Data
* TPC-H of scale factor 100
### Cluster
* One master and 3 workers
* Each worker equips 4 cores, 8 GB memory, and 2 HDDs.
### Performance comparison
* Elapsed time
* Before patch: 25 mins, 23 sec
* After patch: 19 mins, 59 sec
* Performance improvement ratio: about 20%
### Query plan
The query execution time is reduced due to the improved query plan as
follows.
#### Before patch
```
-----------------------------
Query Block Graph
-----------------------------
|-#ROOT
-----------------------------
Optimization Log:
[LogicalPlan]
> ProjectionNode is eliminated.
[#ROOT]
> Non-optimized join order: ((((tpch100.lineitem ⟕ tpch100.orders) ⟕
tpch100.partsupp) ⟕ tpch100.customer) ⟕ tpch100.part) (cost:
1.0447965953264456E46)
> Optimized join order : ((((tpch100.lineitem ⟕ tpch100.orders) ⟕
tpch100.partsupp) ⟕ tpch100.customer) ⟕ tpch100.part) (cost:
1.0447965859707236E46)
-----------------------------
GROUP_BY(10)()
=> exprs: (count())
=> target list: ?count (INT8)
=> out schema:{(1) ?count (INT8)}
=> in schema:{(0) }
JOIN(15)(LEFT_OUTER)
=> Join Cond: tpch100.part.p_partkey (INT8) =
tpch100.customer.c_nationkey (INT8)
=> target list:
=> out schema: {(0) }
=> in schema: {(2) tpch100.customer.c_nationkey (INT8),
tpch100.part.p_partkey (INT8)}
SCAN(7) on tpch100.part
=> target list: tpch100.part.p_partkey (INT8)
=> out schema: {(1) tpch100.part.p_partkey (INT8)}
=> in schema: {(9) tpch100.part.p_partkey (INT8),
tpch100.part.p_name (TEXT), tpch100.part.p_mfgr (TEXT), tpch100.part.p_brand
(TEXT), tpch100.part.p_type (TEXT), tpch100.part.p_size (INT4),
tpch100.part.p_container (TEXT), tpch100.part.p_retailprice (FLOAT8),
tpch100.part.p_comment (TEXT)}
JOIN(14)(LEFT_OUTER)
=> Join Cond: tpch100.partsupp.ps_suppkey (INT8) =
tpch100.customer.c_custkey (INT8)
=> target list: tpch100.customer.c_nationkey (INT8)
=> out schema: {(1) tpch100.customer.c_nationkey (INT8)}
=> in schema: {(3) tpch100.partsupp.ps_suppkey (INT8),
tpch100.customer.c_nationkey (INT8), tpch100.customer.c_custkey (INT8)}
SCAN(5) on tpch100.customer
=> target list: tpch100.customer.c_nationkey (INT8),
tpch100.customer.c_custkey (INT8)
=> out schema: {(2) tpch100.customer.c_nationkey (INT8),
tpch100.customer.c_custkey (INT8)}
=> in schema: {(8) tpch100.customer.c_custkey (INT8),
tpch100.customer.c_name (TEXT), tpch100.customer.c_address (TEXT),
tpch100.customer.c_nationkey (INT8), tpch100.customer.c_phone (TEXT),
tpch100.customer.c_acctbal (FLOAT8), tpch100.customer.c_mktsegment (TEXT),
tpch100.customer.c_comment (TEXT)}
JOIN(13)(LEFT_OUTER)
=> Join Cond: tpch100.partsupp.ps_suppkey (INT8) =
tpch100.orders.o_custkey (INT8)
=> target list: tpch100.partsupp.ps_suppkey (INT8)
=> out schema: {(1) tpch100.partsupp.ps_suppkey (INT8)}
=> in schema: {(2) tpch100.orders.o_custkey (INT8),
tpch100.partsupp.ps_suppkey (INT8)}
SCAN(3) on tpch100.partsupp
=> target list: tpch100.partsupp.ps_suppkey (INT8)
=> out schema: {(1) tpch100.partsupp.ps_suppkey (INT8)}
=> in schema: {(5) tpch100.partsupp.ps_partkey (INT8),
tpch100.partsupp.ps_suppkey (INT8), tpch100.partsupp.ps_availqty (INT4),
tpch100.partsupp.ps_supplycost (FLOAT8), tpch100.partsupp.ps_comment (TEXT)}
JOIN(12)(LEFT_OUTER)
=> Join Cond: tpch100.lineitem.l_orderkey (INT8) =
tpch100.orders.o_orderkey (INT8)
=> target list: tpch100.orders.o_custkey (INT8)
=> out schema: {(1) tpch100.orders.o_custkey (INT8)}
=> in schema: {(3) tpch100.lineitem.l_orderkey (INT8),
tpch100.orders.o_custkey (INT8), tpch100.orders.o_orderkey (INT8)}
SCAN(1) on tpch100.orders
=> target list: tpch100.orders.o_custkey (INT8),
tpch100.orders.o_orderkey (INT8)
=> out schema: {(2) tpch100.orders.o_custkey (INT8),
tpch100.orders.o_orderkey (INT8)}
=> in schema: {(9) tpch100.orders.o_orderkey (INT8),
tpch100.orders.o_custkey (INT8), tpch100.orders.o_orderstatus (TEXT),
tpch100.orders.o_totalprice (FLOAT8), tpch100.orders.o_orderdate (DATE),
tpch100.orders.o_orderpriority (TEXT), tpch100.orders.o_clerk (TEXT),
tpch100.orders.o_shippriority (INT4), tpch100.orders.o_comment (TEXT)}
SCAN(0) on tpch100.lineitem
=> target list: tpch100.lineitem.l_orderkey (INT8)
=> out schema: {(1) tpch100.lineitem.l_orderkey (INT8)}
=> in schema: {(16) tpch100.lineitem.l_orderkey (INT8),
tpch100.lineitem.l_partkey (INT8), tpch100.lineitem.l_suppkey (INT8),
tpch100.lineitem.l_linenumber (INT8), tpch100.lineitem.l_quantity (FLOAT8),
tpch100.lineitem.l_extendedprice (FLOAT8), tpch100.lineitem.l_discount
(FLOAT8), tpch100.lineitem.l_tax (FLOAT8), tpch100.lineitem.l_returnflag
(TEXT), tpch100.lineitem.l_linestatus (TEXT), tpch100.lineitem.l_shipdate
(DATE), tpch100.lineitem.l_commitdate (DATE), tpch100.lineitem.l_receiptdate
(DATE), tpch100.lineitem.l_shipinstruct (TEXT), tpch100.lineitem.l_shipmode
(TEXT), tpch100.lineitem.l_comment (TEXT)}
```
#### After patch
```
-----------------------------
Query Block Graph
-----------------------------
|-#ROOT
-----------------------------
Optimization Log:
[LogicalPlan]
> ProjectionNode is eliminated.
[#ROOT]
> Non-optimized join order: ((((tpch100.lineitem ⟕ tpch100.orders) ⟕
tpch100.partsupp) ⟕ tpch100.customer) ⟕ tpch100.part) (cost:
7.933924122078524E46)
> Optimized join order : ((tpch100.lineitem ⟕ (tpch100.orders ⟕
tpch100.partsupp)) ⟕ (tpch100.customer ⟕ tpch100.part)) (cost:
4.016549062824562E47)
-----------------------------
GROUP_BY(10)()
=> exprs: (count())
=> target list: ?count (INT8)
=> out schema:{(1) ?count (INT8)}
=> in schema:{(0) }
JOIN(15)(LEFT_OUTER)
=> Join Cond: tpch100.partsupp.ps_suppkey (INT8) =
tpch100.customer.c_custkey (INT8)
=> target list:
=> out schema: {(0) }
=> in schema: {(2) tpch100.partsupp.ps_suppkey (INT8),
tpch100.customer.c_custkey (INT8)}
JOIN(14)(LEFT_OUTER)
=> Join Cond: tpch100.part.p_partkey (INT8) =
tpch100.customer.c_nationkey (INT8)
=> target list: tpch100.customer.c_custkey (INT8)
=> out schema: {(1) tpch100.customer.c_custkey (INT8)}
=> in schema: {(3) tpch100.customer.c_custkey (INT8),
tpch100.customer.c_nationkey (INT8), tpch100.part.p_partkey (INT8)}
SCAN(7) on tpch100.part
=> target list: tpch100.part.p_partkey (INT8)
=> out schema: {(1) tpch100.part.p_partkey (INT8)}
=> in schema: {(9) tpch100.part.p_partkey (INT8),
tpch100.part.p_name (TEXT), tpch100.part.p_mfgr (TEXT), tpch100.part.p_brand
(TEXT), tpch100.part.p_type (TEXT), tpch100.part.p_size (INT4),
tpch100.part.p_container (TEXT), tpch100.part.p_retailprice (FLOAT8),
tpch100.part.p_comment (TEXT)}
SCAN(5) on tpch100.customer
=> target list: tpch100.customer.c_custkey (INT8),
tpch100.customer.c_nationkey (INT8)
=> out schema: {(2) tpch100.customer.c_custkey (INT8),
tpch100.customer.c_nationkey (INT8)}
=> in schema: {(8) tpch100.customer.c_custkey (INT8),
tpch100.customer.c_name (TEXT), tpch100.customer.c_address (TEXT),
tpch100.customer.c_nationkey (INT8), tpch100.customer.c_phone (TEXT),
tpch100.customer.c_acctbal (FLOAT8), tpch100.customer.c_mktsegment (TEXT),
tpch100.customer.c_comment (TEXT)}
JOIN(13)(LEFT_OUTER)
=> Join Cond: tpch100.lineitem.l_orderkey (INT8) =
tpch100.orders.o_orderkey (INT8)
=> target list: tpch100.partsupp.ps_suppkey (INT8)
=> out schema: {(1) tpch100.partsupp.ps_suppkey (INT8)}
=> in schema: {(3) tpch100.lineitem.l_orderkey (INT8),
tpch100.partsupp.ps_suppkey (INT8), tpch100.orders.o_orderkey (INT8)}
JOIN(12)(LEFT_OUTER)
=> Join Cond: tpch100.partsupp.ps_suppkey (INT8) =
tpch100.orders.o_custkey (INT8)
=> target list: tpch100.partsupp.ps_suppkey (INT8),
tpch100.orders.o_orderkey (INT8)
=> out schema: {(2) tpch100.partsupp.ps_suppkey (INT8),
tpch100.orders.o_orderkey (INT8)}
=> in schema: {(3) tpch100.orders.o_orderkey (INT8),
tpch100.orders.o_custkey (INT8), tpch100.partsupp.ps_suppkey (INT8)}
SCAN(3) on tpch100.partsupp
=> target list: tpch100.partsupp.ps_suppkey (INT8)
=> out schema: {(1) tpch100.partsupp.ps_suppkey (INT8)}
=> in schema: {(5) tpch100.partsupp.ps_partkey (INT8),
tpch100.partsupp.ps_suppkey (INT8), tpch100.partsupp.ps_availqty (INT4),
tpch100.partsupp.ps_supplycost (FLOAT8), tpch100.partsupp.ps_comment (TEXT)}
SCAN(1) on tpch100.orders
=> target list: tpch100.orders.o_orderkey (INT8),
tpch100.orders.o_custkey (INT8)
=> out schema: {(2) tpch100.orders.o_orderkey (INT8),
tpch100.orders.o_custkey (INT8)}
=> in schema: {(9) tpch100.orders.o_orderkey (INT8),
tpch100.orders.o_custkey (INT8), tpch100.orders.o_orderstatus (TEXT),
tpch100.orders.o_totalprice (FLOAT8), tpch100.orders.o_orderdate (DATE),
tpch100.orders.o_orderpriority (TEXT), tpch100.orders.o_clerk (TEXT),
tpch100.orders.o_shippriority (INT4), tpch100.orders.o_comment (TEXT)}
SCAN(0) on tpch100.lineitem
=> target list: tpch100.lineitem.l_orderkey (INT8)
=> out schema: {(1) tpch100.lineitem.l_orderkey (INT8)}
=> in schema: {(16) tpch100.lineitem.l_orderkey (INT8),
tpch100.lineitem.l_partkey (INT8), tpch100.lineitem.l_suppkey (INT8),
tpch100.lineitem.l_linenumber (INT8), tpch100.lineitem.l_quantity (FLOAT8),
tpch100.lineitem.l_extendedprice (FLOAT8), tpch100.lineitem.l_discount
(FLOAT8), tpch100.lineitem.l_tax (FLOAT8), tpch100.lineitem.l_returnflag
(TEXT), tpch100.lineitem.l_linestatus (TEXT), tpch100.lineitem.l_shipdate
(DATE), tpch100.lineitem.l_commitdate (DATE), tpch100.lineitem.l_receiptdate
(DATE), tpch100.lineitem.l_shipinstruct (TEXT), tpch100.lineitem.l_shipmode
(TEXT), tpch100.lineitem.l_comment (TEXT)}
```
> Improve the join order algorithm to consider missed cases of associative join
> operators
> ---------------------------------------------------------------------------------------
>
> Key: TAJO-1352
> URL: https://issues.apache.org/jira/browse/TAJO-1352
> Project: Tajo
> Issue Type: Improvement
> Components: planner/optimizer
> Reporter: Jihoon Son
> Assignee: Jihoon Son
> Attachments: JoinGraph.png
>
>
> TAJO-1277 fixes a bug related to the associativity of join operators, but
> there are still some missed cases that join operators are associative. This
> work should include the cases described in the following links:
> * http://stackoverflow.com/questions/20022196/are-left-outer-joins-associative
> * https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)