Rahul Challapalli created DRILL-5148:
----------------------------------------
Summary: Replace hash-distribution with a simple round-robin
distribution for a simple order by query
Key: DRILL-5148
URL: https://issues.apache.org/jira/browse/DRILL-5148
Project: Apache Drill
Issue Type: Bug
Components: Execution - Relational Operators, Query Planning &
Optimization
Affects Versions: 1.10.0
Reporter: Rahul Challapalli
git.commit.id.abbrev=cf2b7c7
The below plan indicates that we use hash-distribution to avoid data skew.
However in the below case a simple round-robin approach would be sufficient
{code}
explain plan for select * from
dfs.`/drill/testdata/resource-manager/5kwidecolumns_500k.tbl` order by
columns[0];
+------+------+
| text | json |
+------+------+
| 00-00 Screen
00-01 Project(*=[$0])
00-02 Project(T2¦¦*=[$0])
00-03 SingleMergeExchange(sort0=[1 ASC])
01-01 SelectionVectorRemover
01-02 Sort(sort0=[$1], dir0=[ASC])
01-03 Project(T2¦¦*=[$0], EXPR$1=[$1])
01-04 HashToRandomExchange(dist0=[[$1]])
02-01 UnorderedMuxExchange
03-01 Project(T2¦¦*=[$0], EXPR$1=[$1],
E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1)])
03-02 Project(T2¦¦*=[$0], EXPR$1=[ITEM($1, 0)])
03-03 Project(T2¦¦*=[$0], columns=[$1])
03-04 Scan(groupscan=[EasyGroupScan
[selectionRoot=maprfs:/drill/testdata/resource-manager/5kwidecolumns_500k.tbl,
numFiles=1, columns=[`*`],
files=[maprfs:///drill/testdata/resource-manager/5kwidecolumns_500k.tbl]]])
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)