[ 
https://issues.apache.org/jira/browse/DRILL-5235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848701#comment-15848701
 ] 

Zelaine Fong commented on DRILL-5235:
-------------------------------------

If I do:

explain plan for select columns[0] col1 from `employee.json` order by col1;

this is what I get:

{code}
00-00    Screen
00-01      Project(col1=[$0])
00-02        SelectionVectorRemover
00-03          Sort(sort0=[$0], dir0=[ASC])
00-04            Project(col1=[ITEM($0, 0)])
00-05              Scan(groupscan=[EasyGroupScan 
[selectionRoot=classpath:/employee.json, numFiles=1, columns=[`columns`[0]], 
files=[classpath:/employee.json]]])
{code}

I.e., there's only a single column in the project, and a single column in the 
sort.

> Column alias doubles sort data size when reading a text file
> ------------------------------------------------------------
>
>                 Key: DRILL-5235
>                 URL: https://issues.apache.org/jira/browse/DRILL-5235
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.9.0
>            Reporter: Paul Rogers
>            Priority: Minor
>
> Consider a simple query that reads data from a pipe-separated-value file and 
> sorts it. The file has just one column. The query looks something like this:
> {code}
> SELECT columns[0] col1 FROM `dfs.data`.`input-file.tbl` ORDER BY col1
> {code}
> Looking at the query plan, we see that a project operator not just creates an 
> alias {{col1}} for {{column\[0]}}, it also makes a *copy*.
> The particular input file is 20 GB in size and contains just one column. As a 
> result of materializing the alias, data size to the sort doubles to 40 GB. 
> This results in doubling query run time. If the sort must spill to disk, run 
> times increases by a much larger factor.
> The fix is to treat the alias as an alias, not a materialized copy.
> {code}
> {
>   "graph" : [ {
>     "pop" : "fs-scan",
>     "columns" : [ "`columns`[0]" ],
>   }, {
>     "pop" : "project",
>     "@id" : 4,
>     "exprs" : [ {
>       "ref" : "`col1`",
>       "expr" : "`columns`[0]"
>     } ],
>   }, {
>     "pop" : "external-sort",
>     "orderings" : [ {
>       "order" : "ASC",
>       "expr" : "`col1`",
>       "nullDirection" : "UNSPECIFIED"
>     } ],
>   }, {
>     "pop" : "selection-vector-remover",
>   }, {
>     "pop" : "project",
>   }, {
>     "pop" : "screen",
>   } ]
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to