[ https://issues.apache.org/jira/browse/DRILL-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jinfeng Ni updated DRILL-1499: ------------------------------ Attachment: 0001-DRILL-1499-Drill-return-same-column-order-for-a-sche.patch > Different column order could appear in the result set for a schema-less > select * query, even there are no changing schemas. > --------------------------------------------------------------------------------------------------------------------------- > > Key: DRILL-1499 > URL: https://issues.apache.org/jira/browse/DRILL-1499 > Project: Apache Drill > Issue Type: Bug > Reporter: Jinfeng Ni > Assignee: Jinfeng Ni > Attachments: > 0001-DRILL-1499-Drill-return-same-column-order-for-a-sche.patch > > > For a select * query referring to a schema-less table, Drill could return > different column, depending on the physical operators the query involves: > Q1: > {code} > select * from cp.`employee.json` limit 3; > +-------------+------------+------------+------------+-------------+----------------+------------+---------------+------------+------------+------------+---------------+-----------------+----------------+------------+-----------------+ > | employee_id | full_name | first_name | last_name | position_id | > position_title | store_id | department_id | birth_date | hire_date | > salary | supervisor_id | education_level | marital_status | gender | > management_role | > +-------------+------------+------------+------------+-------------+----------------+------------+---------------+------------+------------+------------+---------------+-----------------+----------------+------------+-----------------+ > {code} > Q2: > {code} > select * from cp.`employee.json` order by last_name limit 3; > +------------+---------------+-----------------+-------------+------------+------------+------------+------------+------------+-----------------+----------------+-------------+----------------+------------+------------+---------------+ > | birth_date | department_id | education_level | employee_id | first_name | > full_name | gender | hire_date | last_name | management_role | > marital_status | position_id | position_title | salary | store_id | > supervisor_id | > +------------+---------------+-----------------+-------------+------------+------------+------------+------------+------------+-----------------+----------------+-------------+----------------+------------+------------+---------------+ > {code} > The difference between Q1 and Q2 is the order by clause. With order by > clause in Q2, Drill will sort the column names alphabetically, while for Q1, > the column names are in the same order as in the data source. > The underlying cause for such difference is that the sort or sort-based > merger operator would require canonicalization, since the incoming batches > could contain different schemas. > However, it would be better that such canonicalization is used only when the > incoming batches have changing schemas. If all the incoming batches have > identical schemas, no need to sort the column orders. With this fix, Drill > will present the same column order in the result set, for a schema-less > select * query, if there is no changing schemas from incoming data sources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)