Jinfeng Ni created DRILL-5747:
---------------------------------
Summary: Drill should put directory name field in same sequence
w.r.t regular column for select * query
Key: DRILL-5747
URL: https://issues.apache.org/jira/browse/DRILL-5747
Project: Apache Drill
Issue Type: Bug
Reporter: Jinfeng Ni
Assignee: Jinfeng Ni
Today, star column * in Drill would expand into a list of regular columns, and
the directory name field such as dir0, dir1. However, Drill does not put the
directory name field with respect to regular field in a consistent way.
For instance, for parquet files, dir0 is put behind the list of regular columns.
{code}
select * from dfs.tmp.parquetTbl where dir0 = 1990;
+--------------+--------------+--------------+--------------+-------+
| N_NATIONKEY | N_NAME | N_REGIONKEY | N_COMMENT | dir0 |
+--------------+--------------+--------------+--------------+-------+
| 0 | [B@5527446 | 0 | [B@684fa264 | 1990 |
| 1 | [B@442e88bc | 1 | [B@4b13119c | 1990 |
| 2 | [B@50e93f45 | 1 | [B@138f483 | 1990 |
| 3 | [B@423cc515 | 1 | [B@23af07ac | 1990 |
| 4 | [B@3820bf81 | 4 | [B@6dfccaf0 | 1990 |
| 5 | [B@6f6f8af9 | 0 | [B@40d1a97 | 1990 |
| 6 | [B@784cb194 | 3 | [B@731ea93f | 1990 |
| 7 | [B@61f9a224 | 3 | [B@4c041bbc | 1990 |
| 8 | [B@21b8faa1 | 2 | [B@774e7152 | 1990 |
| 9 | [B@3ef1fbaf | 2 | [B@c2be72 | 1990 |
| 10 | [B@71652ec1 | 4 | [B@29e0bb10 | 1990 |
| 11 | [B@61192cea | 4 | [B@3bd3e873 | 1990 |
| 12 | [B@5541f4b4 | 2 | [B@5d288126 | 1990 |
| 13 | [B@e371592 | 4 | [B@42692b88 | 1990 |
| 14 | [B@6a90fc8 | 0 | [B@454b16e2 | 1990 |
| 15 | [B@44cb72f8 | 0 | [B@8e91b11 | 1990 |
| 16 | [B@7feffda8 | 0 | [B@64f66236 | 1990 |
| 17 | [B@6ba9fb02 | 1 | [B@649e7786 | 1990 |
| 18 | [B@5fb93205 | 2 | [B@7783175b | 1990 |
| 19 | [B@3f7294a9 | 3 | [B@7b7e03c9 | 1990 |
| 20 | [B@e2ac076 | 4 | [B@18c18a3e | 1990 |
| 21 | [B@4a5af924 | 2 | [B@1a9ad09f | 1990 |
| 22 | [B@29f6845e | 3 | [B@776c4cd7 | 1990 |
| 23 | [B@6728f481 | 3 | [B@31cc7610 | 1990 |
| 24 | [B@665b2dfa | 1 | [B@6c27ac95 | 1990 |
+--------------+--------------+--------------+--------------+-------+
{code}
Notice in the above output, dir0 = 1990 is the last column.
However, for JSON, dir0 is put in front of the list of regular columns.
{code}
select * from dfs.tmp.jsonTbl where dir0 = 1990;
+-------+------+
| dir0 | a |
+-------+------+
| 1990 | 100 |
| 1990 | 200 |
+-------+------+
{code}
It would be good to present the directory name field in the same sequence
regardless of file format, storage plugin. IMHO, it makes sense to put the
directory name field in front of the list of regular columns ( the behavior
that JSON format present today).
This ticket is opened to modify Drill's ScanBatch code for the above explained
purpose.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)