[
https://issues.apache.org/jira/browse/DRILL-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144475#comment-16144475
]
Paul Rogers commented on DRILL-5747:
------------------------------------
This change is already being done as part of the revised {{ScanBatch}} for the
project to limit Drill batch sizes.
> Drill should put directory name field in same sequence w.r.t regular column
> for select * query
> ----------------------------------------------------------------------------------------------
>
> Key: DRILL-5747
> URL: https://issues.apache.org/jira/browse/DRILL-5747
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Jinfeng Ni
> Assignee: Jinfeng Ni
>
> Today, star column * in Drill would expand into a list of regular columns,
> and the directory name field such as dir0, dir1. However, Drill does not put
> the directory name field with respect to regular field in a consistent way.
> For instance, for parquet files, dir0 is put behind the list of regular
> columns.
> {code}
> select * from dfs.tmp.parquetTbl where dir0 = 1990;
> +--------------+--------------+--------------+--------------+-------+
> | N_NATIONKEY | N_NAME | N_REGIONKEY | N_COMMENT | dir0 |
> +--------------+--------------+--------------+--------------+-------+
> | 0 | [B@5527446 | 0 | [B@684fa264 | 1990 |
> | 1 | [B@442e88bc | 1 | [B@4b13119c | 1990 |
> | 2 | [B@50e93f45 | 1 | [B@138f483 | 1990 |
> | 3 | [B@423cc515 | 1 | [B@23af07ac | 1990 |
> | 4 | [B@3820bf81 | 4 | [B@6dfccaf0 | 1990 |
> | 5 | [B@6f6f8af9 | 0 | [B@40d1a97 | 1990 |
> | 6 | [B@784cb194 | 3 | [B@731ea93f | 1990 |
> | 7 | [B@61f9a224 | 3 | [B@4c041bbc | 1990 |
> | 8 | [B@21b8faa1 | 2 | [B@774e7152 | 1990 |
> | 9 | [B@3ef1fbaf | 2 | [B@c2be72 | 1990 |
> | 10 | [B@71652ec1 | 4 | [B@29e0bb10 | 1990 |
> | 11 | [B@61192cea | 4 | [B@3bd3e873 | 1990 |
> | 12 | [B@5541f4b4 | 2 | [B@5d288126 | 1990 |
> | 13 | [B@e371592 | 4 | [B@42692b88 | 1990 |
> | 14 | [B@6a90fc8 | 0 | [B@454b16e2 | 1990 |
> | 15 | [B@44cb72f8 | 0 | [B@8e91b11 | 1990 |
> | 16 | [B@7feffda8 | 0 | [B@64f66236 | 1990 |
> | 17 | [B@6ba9fb02 | 1 | [B@649e7786 | 1990 |
> | 18 | [B@5fb93205 | 2 | [B@7783175b | 1990 |
> | 19 | [B@3f7294a9 | 3 | [B@7b7e03c9 | 1990 |
> | 20 | [B@e2ac076 | 4 | [B@18c18a3e | 1990 |
> | 21 | [B@4a5af924 | 2 | [B@1a9ad09f | 1990 |
> | 22 | [B@29f6845e | 3 | [B@776c4cd7 | 1990 |
> | 23 | [B@6728f481 | 3 | [B@31cc7610 | 1990 |
> | 24 | [B@665b2dfa | 1 | [B@6c27ac95 | 1990 |
> +--------------+--------------+--------------+--------------+-------+
> {code}
> Notice in the above output, dir0 = 1990 is the last column.
> However, for JSON, dir0 is put in front of the list of regular columns.
> {code}
> select * from dfs.tmp.jsonTbl where dir0 = 1990;
> +-------+------+
> | dir0 | a |
> +-------+------+
> | 1990 | 100 |
> | 1990 | 200 |
> +-------+------+
> {code}
> It would be good to present the directory name field in the same sequence
> regardless of file format, storage plugin. IMHO, it makes sense to put the
> directory name field in front of the list of regular columns ( the behavior
> that JSON format present today).
> This ticket is opened to modify Drill's ScanBatch code for the above
> explained purpose.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)