[
https://issues.apache.org/jira/browse/DRILL-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324661#comment-14324661
]
Victoria Markman commented on DRILL-1499:
-----------------------------------------
Not fixed, reopening.
Verified in:
#Thu Feb 12 12:13:26 EST 2015
git.commit.id.abbrev=de89f36
{code}
0: jdbc:drill:schema=dfs> select * from alltypes limit 1;
+------------+------------+------------+----------------+--------------+------------+------------+------------+-------------+------------+------------+------------+------------+------------+
| c_varchar | c_integer | c_bigint | c_smalldecimal | c_bigdecimal |
c_float | c_date | c_time | c_timestamp | c_boolean | d9 |
d18 | d28 | d38 |
+------------+------------+------------+----------------+--------------+------------+------------+------------+-------------+------------+------------+------------+------------+------------+
| IL304488381660192170587 | 451237400 | -3477884857818808320 |
2.2943759150803E9 | 5.1187772583216E9 | 6.4945516E9 | 2015-01-21 | 18:21:06 |
2014-01-17 16:14:25.0 | true | 1032.65 | 1032.6516 | 1032.6516 |
1032.651570 |
+------------+------------+------------+----------------+--------------+------------+------------+------------+-------------+------------+------------+------------+------------+------------+
1 row selected (0.107 seconds)
0: jdbc:drill:schema=dfs> select * from alltypes order by c_date limit 1;
+--------------+------------+------------+------------+------------+------------+----------------+------------+-------------+------------+------------+------------+------------+------------+
| c_bigdecimal | c_bigint | c_boolean | c_date | c_float | c_integer
| c_smalldecimal | c_time | c_timestamp | c_varchar | d18 | d28
| d38 | d9 |
+--------------+------------+------------+------------+------------+------------+----------------+------------+-------------+------------+------------+------------+------------+------------+
| -4.0521743567661E9 | -8804872880253829120 | true | 2014-07-20 |
9.9341025E9 | -1263915758 | 8.3573562419042E9 | 02:15:10 | 2014-04-24
13:20:37.0 | ES7123742067108251930856 | 2116.1641 | 2116.1641 | 2116.164140 |
2116.16 |
+--------------+------------+------------+------------+------------+------------+----------------+------------+-------------+------------+------------+------------+------------+------------+
1 row selected (0.49 seconds)
{code}
> Different column order could appear in the result set for a schema-less
> select * query, even there are no changing schemas.
> ---------------------------------------------------------------------------------------------------------------------------
>
> Key: DRILL-1499
> URL: https://issues.apache.org/jira/browse/DRILL-1499
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Jinfeng Ni
> Assignee: Jinfeng Ni
> Fix For: 0.7.0
>
>
> For a select * query referring to a schema-less table, Drill could return
> different column, depending on the physical operators the query involves:
> Q1:
> {code}
> select * from cp.`employee.json` limit 3;
> +-------------+------------+------------+------------+-------------+----------------+------------+---------------+------------+------------+------------+---------------+-----------------+----------------+------------+-----------------+
> | employee_id | full_name | first_name | last_name | position_id |
> position_title | store_id | department_id | birth_date | hire_date |
> salary | supervisor_id | education_level | marital_status | gender |
> management_role |
> +-------------+------------+------------+------------+-------------+----------------+------------+---------------+------------+------------+------------+---------------+-----------------+----------------+------------+-----------------+
> {code}
> Q2:
> {code}
> select * from cp.`employee.json` order by last_name limit 3;
> +------------+---------------+-----------------+-------------+------------+------------+------------+------------+------------+-----------------+----------------+-------------+----------------+------------+------------+---------------+
> | birth_date | department_id | education_level | employee_id | first_name |
> full_name | gender | hire_date | last_name | management_role |
> marital_status | position_id | position_title | salary | store_id |
> supervisor_id |
> +------------+---------------+-----------------+-------------+------------+------------+------------+------------+------------+-----------------+----------------+-------------+----------------+------------+------------+---------------+
> {code}
> The difference between Q1 and Q2 is the order by clause. With order by
> clause in Q2, Drill will sort the column names alphabetically, while for Q1,
> the column names are in the same order as in the data source.
> The underlying cause for such difference is that the sort or sort-based
> merger operator would require canonicalization, since the incoming batches
> could contain different schemas.
> However, it would be better that such canonicalization is used only when the
> incoming batches have changing schemas. If all the incoming batches have
> identical schemas, no need to sort the column orders. With this fix, Drill
> will present the same column order in the result set, for a schema-less
> select * query, if there is no changing schemas from incoming data sources.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)