Khurram Faraaz created DRILL-4106:
-------------------------------------
Summary: Redundant Project on top of Scan in query plan
Key: DRILL-4106
URL: https://issues.apache.org/jira/browse/DRILL-4106
Project: Apache Drill
Issue Type: Bug
Components: Execution - Relational Operators
Affects Versions: 1.3.0
Reporter: Khurram Faraaz
Priority: Minor
Why doe we see two Projects after the Scan in the query plan ?
Table is auto partitioned by column c1
4 node cluster on CentOS, Drill 1.3, git.commit.id=a639c51c
#CTAS statement is,
{code}
CREATE TABLE inNstedDirAutoPrtn PARTITION BY(c1) AS SELECT cast(columns[0] AS
INT) c1, cast(columns[1] AS BIGINT) c2, cast(columns[2] AS CHAR(2)) c3,
cast(columns[3] AS VARCHAR(54)) c4, cast(columns[4] AS TIMESTAMP) c5,
cast(columns[5] AS DATE) c6, cast(columns[6] as BOOLEAN) c7, cast(columns[7] as
DOUBLE) c8, cast(columns[8] as TIME) c9 FROM `nested_dirs/data/csv/allData.csv`;
Why do we see two Projects on top of Scan in query plan ? One of them looks
redundant.
0: jdbc:drill:schema=dfs.tmp> explain plan for select * from inNstedDirAutoPrtn
where c1 IN (1,2,3,4,-1,0,100,-1710);
+------+------+
| text | json |
+------+------+
| 00-00 Screen
00-01 Project(*=[$0])
00-02 Project(*=[$0])
00-03 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=/tmp/inNstedDirAutoPrtn/0_0_48.parquet], ReadEntryWithPath
[path=/tmp/inNstedDirAutoPrtn/0_0_31.parquet], ReadEntryWithPath
[path=/tmp/inNstedDirAutoPrtn/0_0_50.parquet], ReadEntryWithPath
[path=/tmp/inNstedDirAutoPrtn/0_0_47.parquet], ReadEntryWithPath
[path=/tmp/inNstedDirAutoPrtn/0_0_49.parquet], ReadEntryWithPath
[path=/tmp/inNstedDirAutoPrtn/0_0_46.parquet]],
selectionRoot=maprfs:/tmp/inNstedDirAutoPrtn, numFiles=6,
usedMetadataFile=false, columns=[`*`]]])
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)