[ 
https://issues.apache.org/jira/browse/TAJO-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584744#comment-14584744
 ] 

ASF GitHub Bot commented on TAJO-1352:
--------------------------------------

Github user jihoonson commented on the pull request:

    https://github.com/apache/tajo/pull/593#issuecomment-111736855
  
    Here is the simple evaluation result.
    
    ### Query
    ```
    select 
      count(*) 
    from 
      lineitem left outer join orders 
        on l_orderkey = o_orderkey 
      left outer join partsupp 
        on ps_suppkey = o_custkey 
      left outer join customer 
        on ps_suppkey = c_custkey 
      left outer join part 
        on p_partkey = c_nationkey
    ```
    
    ### Data
    * TPC-H of scale factor 100
    
    ### Cluster
    * One master and 3 workers
    * Each worker equips 4 cores, 8 GB memory, and 2 HDDs.
    
    ### Performance comparison
    * Elapsed time
     * Before patch: 25 mins, 23 sec
     * After patch: 19 mins, 59 sec
    * Performance improvement ratio: about 20%
    
    ### Query plan
    The query execution time is reduced due to the improved query plan as 
follows.
    #### Before patch
    ```
    -----------------------------
    Query Block Graph
    -----------------------------
    |-#ROOT
    -----------------------------
    Optimization Log:
    [LogicalPlan]
        > ProjectionNode is eliminated.
    [#ROOT]
        > Non-optimized join order: ((((tpch100.lineitem ⟕ tpch100.orders) ⟕ 
tpch100.partsupp) ⟕ tpch100.customer) ⟕ tpch100.part) (cost: 
1.0447965953264456E46)
        > Optimized join order    : ((((tpch100.lineitem ⟕ tpch100.orders) ⟕ 
tpch100.partsupp) ⟕ tpch100.customer) ⟕ tpch100.part) (cost: 
1.0447965859707236E46)
    -----------------------------
    
    GROUP_BY(10)()
      => exprs: (count())
      => target list: ?count (INT8)
      => out schema:{(1) ?count (INT8)}
      => in schema:{(0) }
       JOIN(15)(LEFT_OUTER)
         => Join Cond: tpch100.part.p_partkey (INT8) = 
tpch100.customer.c_nationkey (INT8)
         => target list: 
         => out schema: {(0) }
         => in schema: {(2) tpch100.customer.c_nationkey (INT8), 
tpch100.part.p_partkey (INT8)}
          SCAN(7) on tpch100.part
            => target list: tpch100.part.p_partkey (INT8)
            => out schema: {(1) tpch100.part.p_partkey (INT8)}
            => in schema: {(9) tpch100.part.p_partkey (INT8), 
tpch100.part.p_name (TEXT), tpch100.part.p_mfgr (TEXT), tpch100.part.p_brand 
(TEXT), tpch100.part.p_type (TEXT), tpch100.part.p_size (INT4), 
tpch100.part.p_container (TEXT), tpch100.part.p_retailprice (FLOAT8), 
tpch100.part.p_comment (TEXT)}
          JOIN(14)(LEFT_OUTER)
            => Join Cond: tpch100.partsupp.ps_suppkey (INT8) = 
tpch100.customer.c_custkey (INT8)
            => target list: tpch100.customer.c_nationkey (INT8)
            => out schema: {(1) tpch100.customer.c_nationkey (INT8)}
            => in schema: {(3) tpch100.partsupp.ps_suppkey (INT8), 
tpch100.customer.c_nationkey (INT8), tpch100.customer.c_custkey (INT8)}
             SCAN(5) on tpch100.customer
               => target list: tpch100.customer.c_nationkey (INT8), 
tpch100.customer.c_custkey (INT8)
               => out schema: {(2) tpch100.customer.c_nationkey (INT8), 
tpch100.customer.c_custkey (INT8)}
               => in schema: {(8) tpch100.customer.c_custkey (INT8), 
tpch100.customer.c_name (TEXT), tpch100.customer.c_address (TEXT), 
tpch100.customer.c_nationkey (INT8), tpch100.customer.c_phone (TEXT), 
tpch100.customer.c_acctbal (FLOAT8), tpch100.customer.c_mktsegment (TEXT), 
tpch100.customer.c_comment (TEXT)}
             JOIN(13)(LEFT_OUTER)
               => Join Cond: tpch100.partsupp.ps_suppkey (INT8) = 
tpch100.orders.o_custkey (INT8)
               => target list: tpch100.partsupp.ps_suppkey (INT8)
               => out schema: {(1) tpch100.partsupp.ps_suppkey (INT8)}
               => in schema: {(2) tpch100.orders.o_custkey (INT8), 
tpch100.partsupp.ps_suppkey (INT8)}
                SCAN(3) on tpch100.partsupp
                  => target list: tpch100.partsupp.ps_suppkey (INT8)
                  => out schema: {(1) tpch100.partsupp.ps_suppkey (INT8)}
                  => in schema: {(5) tpch100.partsupp.ps_partkey (INT8), 
tpch100.partsupp.ps_suppkey (INT8), tpch100.partsupp.ps_availqty (INT4), 
tpch100.partsupp.ps_supplycost (FLOAT8), tpch100.partsupp.ps_comment (TEXT)}
                JOIN(12)(LEFT_OUTER)
                  => Join Cond: tpch100.lineitem.l_orderkey (INT8) = 
tpch100.orders.o_orderkey (INT8)
                  => target list: tpch100.orders.o_custkey (INT8)
                  => out schema: {(1) tpch100.orders.o_custkey (INT8)}
                  => in schema: {(3) tpch100.lineitem.l_orderkey (INT8), 
tpch100.orders.o_custkey (INT8), tpch100.orders.o_orderkey (INT8)}
                   SCAN(1) on tpch100.orders
                     => target list: tpch100.orders.o_custkey (INT8), 
tpch100.orders.o_orderkey (INT8)
                     => out schema: {(2) tpch100.orders.o_custkey (INT8), 
tpch100.orders.o_orderkey (INT8)}
                     => in schema: {(9) tpch100.orders.o_orderkey (INT8), 
tpch100.orders.o_custkey (INT8), tpch100.orders.o_orderstatus (TEXT), 
tpch100.orders.o_totalprice (FLOAT8), tpch100.orders.o_orderdate (DATE), 
tpch100.orders.o_orderpriority (TEXT), tpch100.orders.o_clerk (TEXT), 
tpch100.orders.o_shippriority (INT4), tpch100.orders.o_comment (TEXT)}
                   SCAN(0) on tpch100.lineitem
                     => target list: tpch100.lineitem.l_orderkey (INT8)
                     => out schema: {(1) tpch100.lineitem.l_orderkey (INT8)}
                     => in schema: {(16) tpch100.lineitem.l_orderkey (INT8), 
tpch100.lineitem.l_partkey (INT8), tpch100.lineitem.l_suppkey (INT8), 
tpch100.lineitem.l_linenumber (INT8), tpch100.lineitem.l_quantity (FLOAT8), 
tpch100.lineitem.l_extendedprice (FLOAT8), tpch100.lineitem.l_discount 
(FLOAT8), tpch100.lineitem.l_tax (FLOAT8), tpch100.lineitem.l_returnflag 
(TEXT), tpch100.lineitem.l_linestatus (TEXT), tpch100.lineitem.l_shipdate 
(DATE), tpch100.lineitem.l_commitdate (DATE), tpch100.lineitem.l_receiptdate 
(DATE), tpch100.lineitem.l_shipinstruct (TEXT), tpch100.lineitem.l_shipmode 
(TEXT), tpch100.lineitem.l_comment (TEXT)}
    ```
    
    #### After patch
    ```
    -----------------------------
    Query Block Graph
    -----------------------------
    |-#ROOT
    -----------------------------
    Optimization Log:
    [LogicalPlan]
        > ProjectionNode is eliminated.
    [#ROOT]
        > Non-optimized join order: ((((tpch100.lineitem ⟕ tpch100.orders) ⟕ 
tpch100.partsupp) ⟕ tpch100.customer) ⟕ tpch100.part) (cost: 
7.933924122078524E46)
        > Optimized join order    : ((tpch100.lineitem ⟕ (tpch100.orders ⟕ 
tpch100.partsupp)) ⟕ (tpch100.customer ⟕ tpch100.part)) (cost: 
4.016549062824562E47)
    -----------------------------
    
    GROUP_BY(10)()
      => exprs: (count())
      => target list: ?count (INT8)
      => out schema:{(1) ?count (INT8)}
      => in schema:{(0) }
       JOIN(15)(LEFT_OUTER)
         => Join Cond: tpch100.partsupp.ps_suppkey (INT8) = 
tpch100.customer.c_custkey (INT8)
         => target list: 
         => out schema: {(0) }
         => in schema: {(2) tpch100.partsupp.ps_suppkey (INT8), 
tpch100.customer.c_custkey (INT8)}
          JOIN(14)(LEFT_OUTER)
            => Join Cond: tpch100.part.p_partkey (INT8) = 
tpch100.customer.c_nationkey (INT8)
            => target list: tpch100.customer.c_custkey (INT8)
            => out schema: {(1) tpch100.customer.c_custkey (INT8)}
            => in schema: {(3) tpch100.customer.c_custkey (INT8), 
tpch100.customer.c_nationkey (INT8), tpch100.part.p_partkey (INT8)}
             SCAN(7) on tpch100.part
               => target list: tpch100.part.p_partkey (INT8)
               => out schema: {(1) tpch100.part.p_partkey (INT8)}
               => in schema: {(9) tpch100.part.p_partkey (INT8), 
tpch100.part.p_name (TEXT), tpch100.part.p_mfgr (TEXT), tpch100.part.p_brand 
(TEXT), tpch100.part.p_type (TEXT), tpch100.part.p_size (INT4), 
tpch100.part.p_container (TEXT), tpch100.part.p_retailprice (FLOAT8), 
tpch100.part.p_comment (TEXT)}
             SCAN(5) on tpch100.customer
               => target list: tpch100.customer.c_custkey (INT8), 
tpch100.customer.c_nationkey (INT8)
               => out schema: {(2) tpch100.customer.c_custkey (INT8), 
tpch100.customer.c_nationkey (INT8)}
               => in schema: {(8) tpch100.customer.c_custkey (INT8), 
tpch100.customer.c_name (TEXT), tpch100.customer.c_address (TEXT), 
tpch100.customer.c_nationkey (INT8), tpch100.customer.c_phone (TEXT), 
tpch100.customer.c_acctbal (FLOAT8), tpch100.customer.c_mktsegment (TEXT), 
tpch100.customer.c_comment (TEXT)}
          JOIN(13)(LEFT_OUTER)
            => Join Cond: tpch100.lineitem.l_orderkey (INT8) = 
tpch100.orders.o_orderkey (INT8)
            => target list: tpch100.partsupp.ps_suppkey (INT8)
            => out schema: {(1) tpch100.partsupp.ps_suppkey (INT8)}
            => in schema: {(3) tpch100.lineitem.l_orderkey (INT8), 
tpch100.partsupp.ps_suppkey (INT8), tpch100.orders.o_orderkey (INT8)}
             JOIN(12)(LEFT_OUTER)
               => Join Cond: tpch100.partsupp.ps_suppkey (INT8) = 
tpch100.orders.o_custkey (INT8)
               => target list: tpch100.partsupp.ps_suppkey (INT8), 
tpch100.orders.o_orderkey (INT8)
               => out schema: {(2) tpch100.partsupp.ps_suppkey (INT8), 
tpch100.orders.o_orderkey (INT8)}
               => in schema: {(3) tpch100.orders.o_orderkey (INT8), 
tpch100.orders.o_custkey (INT8), tpch100.partsupp.ps_suppkey (INT8)}
                SCAN(3) on tpch100.partsupp
                  => target list: tpch100.partsupp.ps_suppkey (INT8)
                  => out schema: {(1) tpch100.partsupp.ps_suppkey (INT8)}
                  => in schema: {(5) tpch100.partsupp.ps_partkey (INT8), 
tpch100.partsupp.ps_suppkey (INT8), tpch100.partsupp.ps_availqty (INT4), 
tpch100.partsupp.ps_supplycost (FLOAT8), tpch100.partsupp.ps_comment (TEXT)}
                SCAN(1) on tpch100.orders
                  => target list: tpch100.orders.o_orderkey (INT8), 
tpch100.orders.o_custkey (INT8)
                  => out schema: {(2) tpch100.orders.o_orderkey (INT8), 
tpch100.orders.o_custkey (INT8)}
                  => in schema: {(9) tpch100.orders.o_orderkey (INT8), 
tpch100.orders.o_custkey (INT8), tpch100.orders.o_orderstatus (TEXT), 
tpch100.orders.o_totalprice (FLOAT8), tpch100.orders.o_orderdate (DATE), 
tpch100.orders.o_orderpriority (TEXT), tpch100.orders.o_clerk (TEXT), 
tpch100.orders.o_shippriority (INT4), tpch100.orders.o_comment (TEXT)}
             SCAN(0) on tpch100.lineitem
               => target list: tpch100.lineitem.l_orderkey (INT8)
               => out schema: {(1) tpch100.lineitem.l_orderkey (INT8)}
               => in schema: {(16) tpch100.lineitem.l_orderkey (INT8), 
tpch100.lineitem.l_partkey (INT8), tpch100.lineitem.l_suppkey (INT8), 
tpch100.lineitem.l_linenumber (INT8), tpch100.lineitem.l_quantity (FLOAT8), 
tpch100.lineitem.l_extendedprice (FLOAT8), tpch100.lineitem.l_discount 
(FLOAT8), tpch100.lineitem.l_tax (FLOAT8), tpch100.lineitem.l_returnflag 
(TEXT), tpch100.lineitem.l_linestatus (TEXT), tpch100.lineitem.l_shipdate 
(DATE), tpch100.lineitem.l_commitdate (DATE), tpch100.lineitem.l_receiptdate 
(DATE), tpch100.lineitem.l_shipinstruct (TEXT), tpch100.lineitem.l_shipmode 
(TEXT), tpch100.lineitem.l_comment (TEXT)}
    ```


> Improve the join order algorithm to consider missed cases of associative join 
> operators
> ---------------------------------------------------------------------------------------
>
>                 Key: TAJO-1352
>                 URL: https://issues.apache.org/jira/browse/TAJO-1352
>             Project: Tajo
>          Issue Type: Improvement
>          Components: planner/optimizer
>            Reporter: Jihoon Son
>            Assignee: Jihoon Son
>         Attachments: JoinGraph.png
>
>
> TAJO-1277 fixes a bug related to the associativity of join operators, but 
> there are still some missed cases that join operators are associative. This 
> work should include the cases described in the following links:
> * http://stackoverflow.com/questions/20022196/are-left-outer-joins-associative
> * https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to