[
https://issues.apache.org/jira/browse/DRILL-6178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372408#comment-16372408
]
Paul Rogers commented on DRILL-6178:
------------------------------------
May not be directly relevant to this ticket, but text files are special.
Unprojected columns are blank, not null. The reasoning is apparently that if
the column did exist, it could never be null (as CSV only supports blanks, not
nulls). So, to ensure that non-existent columns are compatible with existing
columns, the non-existent columns are defined as blank non-nullable Varchar.
To be clear, imagine we have two files, one with (a) the other with (a, b). We
do {{SELECT a, b FROM ourFile.csv}} When reading the first file, b is missing
so we make it an empty non-nullable Varchar, In the second file, column b
exists and is defined as a non-nullable Varchar. Since the two columns have the
same name and type, they can be merged later in, say, a Merge Receiver.
Given this explanation, it is not clear why the example output has null
columns. It should have blank columns as in the second column of the example
output.
> Drill does not project extra columns in some cases
> --------------------------------------------------
>
> Key: DRILL-6178
> URL: https://issues.apache.org/jira/browse/DRILL-6178
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Relational Operators
> Affects Versions: 1.12.0
> Reporter: Robert Hou
> Assignee: Pritesh Maker
> Priority: Major
> Attachments: 10.tbl
>
>
> Drill is supposed to project extra columns as null columns. This table has
> 10 columns. The extra columns are shown as null:
> {noformat}
> 0: jdbc:drill:zk=10.10.104.85:5181> select columns[0], columns[3],
> columns[4], columns[5], columns[6], columns[7], columns[8], columns[9],
> columns[10], columns[11], columns[12], columns[13], columns[14], columns[15]
> from `resource-manager/1.tbl`;
> +---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+----------+----------+----------+
> | EXPR$0 | EXPR$1 | EXPR$2 | EXPR$3 | EXPR$4 | EXPR$5 | EXPR$6 | EXPR$7 |
> EXPR$8 | EXPR$9 | EXPR$10 | EXPR$11 | EXPR$12 | EXPR$13 |
> +---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+----------+----------+----------+
> | 1 | | null | null | null | null | -61 | -255.0 | null | null | null | null
> | null | null |
> +---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+----------+----------+----------+{noformat}
>
> If I run the same query against a table with 10 rows and 10 columns (attached
> to the Jira), only the 10 columns are shown.
>
> {noformat}
> select columns[0], columns[1], columns[2], columns[3], columns[4],
> columns[5], columns[6], columns[7], columns[8], columns[9], columns[10],
> columns[11], columns[12], columns[13], columns[14], columns[15] from
> `10.tbl`{noformat}
>
>
> 5kwidecolumns_500k.tbl
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)