[
https://issues.apache.org/jira/browse/DRILL-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15150631#comment-15150631
]
ASF GitHub Bot commented on DRILL-4387:
---------------------------------------
GitHub user jinfengni opened a pull request:
https://github.com/apache/drill/pull/379
DRILL-4387: GroupScan or ScanBatchCreator should not use star column …
…in case of skipAll query.
The skipAll query should be handled in RecordReader.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jinfengni/incubator-drill DRILL-4387
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/379.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #379
----
commit 5c1edc42dcad6c3b5943424b9a8373cf6ff51753
Author: Jinfeng Ni <[email protected]>
Date: 2016-02-12T22:18:59Z
DRILL-4387: GroupScan or ScanBatchCreator should not use star column in
case of skipAll query.
The skipAll query should be handled in RecordReader.
----
> Improve execution side when it handles skipAll query
> ----------------------------------------------------
>
> Key: DRILL-4387
> URL: https://issues.apache.org/jira/browse/DRILL-4387
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Jinfeng Ni
> Assignee: Jinfeng Ni
> Fix For: 1.6.0
>
>
> DRILL-4279 changes the planner side and the RecordReader in the execution
> side when they handles skipAll query. However, it seems there are other
> places in the codebase that do not handle skipAll query efficiently. In
> particular, in GroupScan or ScanBatchCreator, we will replace a NULL or empty
> column list with star column. This essentially will force the execution side
> (RecordReader) to fetch all the columns for data source. Such behavior will
> lead to big performance overhead for the SCAN operator.
> To improve Drill's performance, we should change those places as well, as a
> follow-up work after DRILL-4279.
> One simple example of this problem is:
> {code}
> SELECT DISTINCT substring(dir1, 5) from dfs.`/Path/To/ParquetTable`;
> {code}
> The query does not require any regular column from the parquet file. However,
> ParquetRowGroupScan and ParquetScanBatchCreator will put star column as the
> column list. In case table has dozens or hundreds of columns, this will make
> SCAN operator much more expensive than necessary.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)