Jinfeng Ni created DRILL-5747:
---------------------------------

             Summary: Drill should put directory name field in same sequence 
w.r.t regular column for select * query
                 Key: DRILL-5747
                 URL: https://issues.apache.org/jira/browse/DRILL-5747
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Jinfeng Ni
            Assignee: Jinfeng Ni


Today,  star column * in Drill would expand into a list of regular columns, and 
the directory name field such as dir0, dir1.  However, Drill does not put the 
directory name field with respect to regular field in a consistent way.

For instance, for parquet files, dir0 is put behind the list of regular columns.

{code}
select * from dfs.tmp.parquetTbl where dir0 = 1990;
+--------------+--------------+--------------+--------------+-------+
| N_NATIONKEY  |    N_NAME    | N_REGIONKEY  |  N_COMMENT   | dir0  |
+--------------+--------------+--------------+--------------+-------+
| 0            | [B@5527446   | 0            | [B@684fa264  | 1990  |
| 1            | [B@442e88bc  | 1            | [B@4b13119c  | 1990  |
| 2            | [B@50e93f45  | 1            | [B@138f483   | 1990  |
| 3            | [B@423cc515  | 1            | [B@23af07ac  | 1990  |
| 4            | [B@3820bf81  | 4            | [B@6dfccaf0  | 1990  |
| 5            | [B@6f6f8af9  | 0            | [B@40d1a97   | 1990  |
| 6            | [B@784cb194  | 3            | [B@731ea93f  | 1990  |
| 7            | [B@61f9a224  | 3            | [B@4c041bbc  | 1990  |
| 8            | [B@21b8faa1  | 2            | [B@774e7152  | 1990  |
| 9            | [B@3ef1fbaf  | 2            | [B@c2be72    | 1990  |
| 10           | [B@71652ec1  | 4            | [B@29e0bb10  | 1990  |
| 11           | [B@61192cea  | 4            | [B@3bd3e873  | 1990  |
| 12           | [B@5541f4b4  | 2            | [B@5d288126  | 1990  |
| 13           | [B@e371592   | 4            | [B@42692b88  | 1990  |
| 14           | [B@6a90fc8   | 0            | [B@454b16e2  | 1990  |
| 15           | [B@44cb72f8  | 0            | [B@8e91b11   | 1990  |
| 16           | [B@7feffda8  | 0            | [B@64f66236  | 1990  |
| 17           | [B@6ba9fb02  | 1            | [B@649e7786  | 1990  |
| 18           | [B@5fb93205  | 2            | [B@7783175b  | 1990  |
| 19           | [B@3f7294a9  | 3            | [B@7b7e03c9  | 1990  |
| 20           | [B@e2ac076   | 4            | [B@18c18a3e  | 1990  |
| 21           | [B@4a5af924  | 2            | [B@1a9ad09f  | 1990  |
| 22           | [B@29f6845e  | 3            | [B@776c4cd7  | 1990  |
| 23           | [B@6728f481  | 3            | [B@31cc7610  | 1990  |
| 24           | [B@665b2dfa  | 1            | [B@6c27ac95  | 1990  |
+--------------+--------------+--------------+--------------+-------+
{code}
Notice in the above output, dir0 = 1990 is the last column.

However, for JSON, dir0 is put in front of the list of regular columns.

{code}
select * from dfs.tmp.jsonTbl where dir0 = 1990;
+-------+------+
| dir0  |  a   |
+-------+------+
| 1990  | 100  |
| 1990  | 200  |
+-------+------+
{code}

It would be good to present the directory name field in the same sequence 
regardless of file format, storage plugin. IMHO, it makes sense to put the 
directory name field in front of the list of regular columns ( the behavior 
that JSON format present today).

This ticket is opened to modify Drill's ScanBatch code for the above explained 
purpose.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to