[
https://issues.apache.org/jira/browse/DRILL-7566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17029781#comment-17029781
]
Igor Guzenko commented on DRILL-7566:
-------------------------------------
Hi all,
Hive views support was added in Drill 1.16.0 (DRILL-540) including materialized
ones.
Thanks,
Igor
> Performance of Common Table Expression query-15
> -----------------------------------------------
>
> Key: DRILL-7566
> URL: https://issues.apache.org/jira/browse/DRILL-7566
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 1.16.0
> Reporter: Aditya Allamraju
> Priority: Major
>
> Observed that the WITH clause is materializing the computation it did as many
> times as it is being referred in the main query. The purpose of CTE is
> defeated here.
> For instance, Query-15 of TPC:
> {code:java}
> . . . . . . . . . . . . . . )> WITH revenue0(supplier_no , total_revenue) AS (
> . . . . . . . . . . . . . . )> SELECT l_suppkey, SUM(l_extendedprice * (1
> - l_discount))
> . . . . . . . . . . . . . . )> FROM lineitem
> . . . . . . . . . . . . . . )> WHERE l_shipdate >= '1996-07-01'
> . . . . . . . . . . . . . . )> AND l_shipdate < DATE_ADD('1996-07-01',
> INTERVAL '90' DAY)
> . . . . . . . . . . . . . . )> GROUP BY l_suppkey )
> . . . . . . . . . . semicolon> SELECT s_suppkey, s_name, s_address, s_phone,
> total_revenue
> . . . . . . . . . . semicolon> FROM supplier, revenue0
> . . . . . . . . . . semicolon> WHERE s_suppkey = supplier_no
> . . . . . . . . . . semicolon> AND total_revenue = (SELECT
> MAX(total_revenue) FROM revenue0)
> . . . . . . . . . . semicolon> ORDER BY s_suppkey;
> +-----------+--------------------+-------------------------------+-----------------+---------------+
> | s_suppkey | s_name | s_address |
> s_phone | total_revenue |
> +-----------+--------------------+-------------------------------+-----------------+---------------+
> | 493 | Supplier#000000493 | 7tdI3AtlDll57sj5K48WLX j5RDbc |
> 21-252-702-2543 | 1779637.1723 |
> +-----------+--------------------+-------------------------------+-----------------+---------------+
> 1 row selected (9.093 seconds)
> apache drill (hive.tpch_text)>
> {code}
> I just performed the above on a small subset. You can see the behavior from
> the explain plan and query profile. But usually, CTE based queries are run on
> huge tables like in users case(each table running into few TB's!).
> Explain plan for above query:
> {code:java}
> apache drill (hive.tpch_text)> explain plan for
> . . . . . . . . . . semicolon> WITH revenue0(supplier_no , total_revenue) AS (
> . . . . . . . . . . . . . . )> SELECT l_suppkey, SUM(l_extendedprice * (1
> - l_discount))
> . . . . . . . . . . . . . . )> FROM lineitem
> . . . . . . . . . . . . . . )> WHERE l_shipdate >= '1996-07-01'
> . . . . . . . . . . . . . . )> AND l_shipdate < DATE_ADD('1996-07-01',
> INTERVAL '90' DAY)
> . . . . . . . . . . . . . . )> GROUP BY l_suppkey )
> . . . . . . . . . . semicolon> SELECT s_suppkey, s_name, s_address, s_phone,
> total_revenue
> . . . . . . . . . . semicolon> FROM supplier, revenue0
> . . . . . . . . . . semicolon> WHERE s_suppkey = supplier_no
> . . . . . . . . . . semicolon> AND total_revenue = (SELECT
> MAX(total_revenue) FROM revenue0)
> . . . . . . . . . . semicolon> ORDER BY s_suppkey;
> +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
> | text
> | json
> |
> +----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
> | 00-00 Screen
> 00-01 Project(s_suppkey=[$0], s_name=[$1], s_address=[$2], s_phone=[$3],
> total_revenue=[$4])
> 00-02 SelectionVectorRemover
> 00-03 Sort(sort0=[$0], dir0=[ASC])
> 00-04 Project(s_suppkey=[$0], s_name=[$1], s_address=[$2],
> s_phone=[$3], total_revenue=[$5])
> 00-05 Project(s_suppkey=[$3], s_name=[$4], s_address=[$5],
> s_phone=[$6], l_suppkey=[$0], EXPR$1=[$1], EXPR$0=[$2])
> 00-06 HashJoin(condition=[=($3, $0)], joinType=[inner],
> semi-join: =[false])
> 00-08 HashJoin(condition=[=($1, $2)], joinType=[inner],
> semi-join: =[false])
> 00-10 HashAgg(group=[{0}], EXPR$1=[SUM($1)])
> 00-12 Project(l_suppkey=[$0], EXPR$1=[$1])
> 00-14 HashToRandomExchange(dist0=[[$0]])
> 01-01 UnorderedMuxExchange
> 03-01 Project(l_suppkey=[$0], EXPR$1=[$1],
> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0, 1301011)])
> 03-02 HashAgg(group=[{0}], EXPR$1=[SUM($1)])
> 03-03 Project(l_suppkey=[$0], $f1=[*($1, -(1,
> $2))])
> 03-04 SelectionVectorRemover
> 03-05 Filter(condition=[AND(>=($3,
> '1996-07-01'), <($3, 1996-09-29 00:00:00))])
> 03-06 Scan(table=[[hive, tpch_text,
> lineitem]], groupscan=[HiveScan [table=Table(dbName:tpch_text,
> tableName:lineitem), columns=[`l_suppkey`, `l_extendedprice`, `l_discount`,
> `l_shipdate`], numPartitions=0, partitions= null,
> inputDirectories=[maprfs:/tpch/lineitem], confProperties={}]])
> 00-09 StreamAgg(group=[{}], EXPR$0=[MAX($0)])
> 00-11 Project(EXPR$1=[$1])
> 00-13 HashAgg(group=[{0}], EXPR$1=[SUM($1)])
> 00-15 Project(l_suppkey=[$0], EXPR$1=[$1])
> 00-16 HashToRandomExchange(dist0=[[$0]])
> 02-01 UnorderedMuxExchange
> 04-01 Project(l_suppkey=[$0], EXPR$1=[$1],
> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0, 1301011)])
> 04-02 HashAgg(group=[{0}], EXPR$1=[SUM($1)])
> 04-03 Project(l_suppkey=[$0], $f1=[*($1,
> -(1, $2))])
> 04-04 SelectionVectorRemover
> 04-05 Filter(condition=[AND(>=($3,
> '1996-07-01'), <($3, 1996-09-29 00:00:00))])
> 04-06 Scan(table=[[hive, tpch_text,
> lineitem]], groupscan=[HiveScan [table=Table(dbName:tpch_text,
> tableName:lineitem), columns=[`l_suppkey`, `l_extendedprice`, `l_discount`,
> `l_shipdate`], numPartitions=0, partitions= null,
> inputDirectories=[maprfs:/tpch/lineitem], confProperties={}]])
> 00-07 Scan(table=[[hive, tpch_text, supplier]],
> groupscan=[HiveScan [table=Table(dbName:tpch_text, tableName:supplier),
> columns=[`s_suppkey`, `s_name`, `s_address`, `s_phone`], numPartitions=0,
> partitions= null, inputDirectories=[maprfs:/tpch/supplier],
> confProperties={}]])
> {code}
>
> Observe Step 03-06 and 04-06. One of them could be avoided.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)