[
https://issues.apache.org/jira/browse/DRILL-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16242709#comment-16242709
]
Chun Chang commented on DRILL-5138:
-----------------------------------
I ran the query against MapR Drill 1.11.0 and query returned in 81 seconds.
[root@perfnode166 catalog_sales]# sqlline --maxWidth=10000 -u
"jdbc:drill:zk=10.10.30.166:5181"
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was
removed in 8.0
apache drill 1.11.0-mapr
"drill baby drill"
0: jdbc:drill:zk=10.10.30.166:5181> select * from
dfs.`/drill/testdata/tpcds_sf100/parquet/catalog_sales` order by cs_quantity,
cs_wholesale_cost limit 1;
+------------------+-------------------+----------------------+-------------------+--------------------+---------------------+----------------+----------------------+--------------------+---------------------+-------------------+-------------+------------------------+-------------+----------------+--------------+-----------------------+---------------------------+----------------------+----------------+------------------+--------------+--------------+-----------------+------------------+-------------------+----------------------+------------------+-------------------+------------------+------------------+------------------+------------------+--------------------+
| cs_bill_addr_sk | cs_bill_cdemo_sk | cs_bill_customer_sk |
cs_bill_hdemo_sk | cs_call_center_sk | cs_catalog_page_sk | cs_coupon_amt |
cs_ext_discount_amt | cs_ext_list_price | cs_ext_sales_price |
cs_ext_ship_cost | cs_ext_tax | cs_ext_wholesale_cost | cs_item_sk |
cs_list_price | cs_net_paid | cs_net_paid_inc_ship |
cs_net_paid_inc_ship_tax | cs_net_paid_inc_tax | cs_net_profit |
cs_order_number | cs_promo_sk | cs_quantity | cs_sales_price |
cs_ship_addr_sk | cs_ship_cdemo_sk | cs_ship_customer_sk | cs_ship_date_sk
| cs_ship_hdemo_sk | cs_ship_mode_sk | cs_sold_date_sk | cs_sold_time_sk |
cs_warehouse_sk | cs_wholesale_cost |
+------------------+-------------------+----------------------+-------------------+--------------------+---------------------+----------------+----------------------+--------------------+---------------------+-------------------+-------------+------------------------+-------------+----------------+--------------+-----------------------+---------------------------+----------------------+----------------+------------------+--------------+--------------+-----------------+------------------+-------------------+----------------------+------------------+-------------------+------------------+------------------+------------------+------------------+--------------------+
| 184649 | 555979 | 1796891 | 1114
| 24 | 14393 | 0.00 | 0.02
| 1.82 | 1.80 | 0.25 | 0.00
| 1.00 | 108618 | 1.82 | 1.80 | 2.05
| 2.05 | 1.80 | 0.80
| 15928478 | 540 | 1 | 1.80 |
184649 | 555979 | 1796891 | 2452671
| 1114 | 9 | 2452640 | 38871 |
1 | 1.00 |
+------------------+-------------------+----------------------+-------------------+--------------------+---------------------+----------------+----------------------+--------------------+---------------------+-------------------+-------------+------------------------+-------------+----------------+--------------+-----------------------+---------------------------+----------------------+----------------+------------------+--------------+--------------+-----------------+------------------+-------------------+----------------------+------------------+-------------------+------------------+------------------+------------------+------------------+--------------------+
1 row selected (81.577 seconds)
> TopN operator on top of ~110 GB data set is very slow
> -----------------------------------------------------
>
> Key: DRILL-5138
> URL: https://issues.apache.org/jira/browse/DRILL-5138
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Relational Operators
> Reporter: Rahul Challapalli
> Assignee: Timothy Farkas
>
> git.commit.id.abbrev=cf2b7c7
> No of cores : 23
> No of disks : 5
> DRILL_MAX_DIRECT_MEMORY="24G"
> DRILL_MAX_HEAP="12G"
> The below query ran for more than 4 hours and did not complete. The table is
> ~110 GB
> {code}
> select * from catalog_sales order by cs_quantity, cs_wholesale_cost limit 1;
> {code}
> Physical Plan :
> {code}
> 00-00 Screen : rowType = RecordType(ANY *): rowcount = 1.0, cumulative
> cost = {1.00798629141E10 rows, 4.17594320691E10 cpu, 0.0 io,
> 4.1287118487552E13 network, 0.0 memory}, id = 352
> 00-01 Project(*=[$0]) : rowType = RecordType(ANY *): rowcount = 1.0,
> cumulative cost = {1.0079862914E10 rows, 4.1759432069E10 cpu, 0.0 io,
> 4.1287118487552E13 network, 0.0 memory}, id = 351
> 00-02 Project(T0¦¦*=[$0]) : rowType = RecordType(ANY T0¦¦*): rowcount
> = 1.0, cumulative cost = {1.0079862914E10 rows, 4.1759432069E10 cpu, 0.0 io,
> 4.1287118487552E13 network, 0.0 memory}, id = 350
> 00-03 SelectionVectorRemover : rowType = RecordType(ANY T0¦¦*, ANY
> cs_quantity, ANY cs_wholesale_cost): rowcount = 1.0, cumulative cost =
> {1.0079862914E10 rows, 4.1759432069E10 cpu, 0.0 io, 4.1287118487552E13
> network, 0.0 memory}, id = 349
> 00-04 Limit(fetch=[1]) : rowType = RecordType(ANY T0¦¦*, ANY
> cs_quantity, ANY cs_wholesale_cost): rowcount = 1.0, cumulative cost =
> {1.0079862913E10 rows, 4.1759432068E10 cpu, 0.0 io, 4.1287118487552E13
> network, 0.0 memory}, id = 348
> 00-05 SingleMergeExchange(sort0=[1 ASC], sort1=[2 ASC]) :
> rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, ANY cs_wholesale_cost):
> rowcount = 1.439980416E9, cumulative cost = {1.0079862912E10 rows,
> 4.1759432064E10 cpu, 0.0 io, 4.1287118487552E13 network, 0.0 memory}, id = 347
> 01-01 SelectionVectorRemover : rowType = RecordType(ANY T0¦¦*,
> ANY cs_quantity, ANY cs_wholesale_cost): rowcount = 1.439980416E9, cumulative
> cost = {8.639882496E9 rows, 3.0239588736E10 cpu, 0.0 io, 2.3592639135744E13
> network, 0.0 memory}, id = 346
> 01-02 TopN(limit=[1]) : rowType = RecordType(ANY T0¦¦*, ANY
> cs_quantity, ANY cs_wholesale_cost): rowcount = 1.439980416E9, cumulative
> cost = {7.19990208E9 rows, 2.879960832E10 cpu, 0.0 io, 2.3592639135744E13
> network, 0.0 memory}, id = 345
> 01-03 Project(T0¦¦*=[$0], cs_quantity=[$1],
> cs_wholesale_cost=[$2]) : rowType = RecordType(ANY T0¦¦*, ANY cs_quantity,
> ANY cs_wholesale_cost): rowcount = 1.439980416E9, cumulative cost =
> {5.759921664E9 rows, 2.879960832E10 cpu, 0.0 io, 2.3592639135744E13 network,
> 0.0 memory}, id = 344
> 01-04 HashToRandomExchange(dist0=[[$1]], dist1=[[$2]]) :
> rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, ANY cs_wholesale_cost, ANY
> E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.439980416E9, cumulative cost =
> {5.759921664E9 rows, 2.879960832E10 cpu, 0.0 io, 2.3592639135744E13 network,
> 0.0 memory}, id = 343
> 02-01 UnorderedMuxExchange : rowType = RecordType(ANY
> T0¦¦*, ANY cs_quantity, ANY cs_wholesale_cost, ANY
> E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.439980416E9, cumulative cost =
> {4.319941248E9 rows, 1.1519843328E10 cpu, 0.0 io, 0.0 network, 0.0 memory},
> id = 342
> 03-01 Project(T0¦¦*=[$0], cs_quantity=[$1],
> cs_wholesale_cost=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($2,
> hash32AsDouble($1))]) : rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, ANY
> cs_wholesale_cost, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.439980416E9,
> cumulative cost = {2.879960832E9 rows, 1.0079862912E10 cpu, 0.0 io, 0.0
> network, 0.0 memory}, id = 341
> 03-02 Project(T0¦¦*=[$0], cs_quantity=[$1],
> cs_wholesale_cost=[$2]) : rowType = RecordType(ANY T0¦¦*, ANY cs_quantity,
> ANY cs_wholesale_cost): rowcount = 1.439980416E9, cumulative cost =
> {1.439980416E9 rows, 4.319941248E9 cpu, 0.0 io, 0.0 network, 0.0 memory}, id
> = 340
> 03-03 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath
> [path=maprfs:///drill/testdata/tpcds/parquet/sf1000/catalog_sales]],
> selectionRoot=maprfs:/drill/testdata/tpcds/parquet/sf1000/catalog_sales,
> numFiles=1, usedMetadataFile=false, columns=[`*`]]]) : rowType =
> (DrillRecordRow[*, cs_quantity, cs_wholesale_cost]): rowcount =
> 1.439980416E9, cumulative cost = {1.439980416E9 rows, 4.319941248E9 cpu, 0.0
> io, 0.0 network, 0.0 memory}, id = 339
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)