[jira] [Commented] (DRILL-1499) Different column order could appear in the result set for a schema-less select * query, even there are no changing schemas.

Victoria Markman (JIRA) Tue, 17 Feb 2015 10:53:48 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324661#comment-14324661
 ]


Victoria Markman commented on DRILL-1499:
-----------------------------------------

Not fixed, reopening.

Verified in:
 
#Thu Feb 12 12:13:26 EST 2015
git.commit.id.abbrev=de89f36

{code}
0: jdbc:drill:schema=dfs> select * from alltypes limit 1;
+------------+------------+------------+----------------+--------------+------------+------------+------------+-------------+------------+------------+------------+------------+------------+
| c_varchar  | c_integer  |  c_bigint  | c_smalldecimal | c_bigdecimal |  
c_float   |   c_date   |   c_time   | c_timestamp | c_boolean  |     d9     |   
 d18     |    d28     |    d38     |
+------------+------------+------------+----------------+--------------+------------+------------+------------+-------------+------------+------------+------------+------------+------------+
| IL304488381660192170587 | 451237400  | -3477884857818808320 | 
2.2943759150803E9 | 5.1187772583216E9 | 6.4945516E9 | 2015-01-21 | 18:21:06   | 
2014-01-17 16:14:25.0 | true       | 1032.65    | 1032.6516  | 1032.6516  | 
1032.651570 |
+------------+------------+------------+----------------+--------------+------------+------------+------------+-------------+------------+------------+------------+------------+------------+
1 row selected (0.107 seconds)

0: jdbc:drill:schema=dfs> select * from alltypes order by c_date limit 1;
+--------------+------------+------------+------------+------------+------------+----------------+------------+-------------+------------+------------+------------+------------+------------+
| c_bigdecimal |  c_bigint  | c_boolean  |   c_date   |  c_float   | c_integer  
| c_smalldecimal |   c_time   | c_timestamp | c_varchar  |    d18     |    d28  
   |    d38     |     d9     |
+--------------+------------+------------+------------+------------+------------+----------------+------------+-------------+------------+------------+------------+------------+------------+
| -4.0521743567661E9 | -8804872880253829120 | true       | 2014-07-20 | 
9.9341025E9 | -1263915758 | 8.3573562419042E9 | 02:15:10   | 2014-04-24 
13:20:37.0 | ES7123742067108251930856 | 2116.1641  | 2116.1641  | 2116.164140 | 
2116.16    |
+--------------+------------+------------+------------+------------+------------+----------------+------------+-------------+------------+------------+------------+------------+------------+
1 row selected (0.49 seconds)

{code}

> Different column order could appear in the result set for a schema-less 
> select * query, even there are no changing schemas.
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-1499
>                 URL: https://issues.apache.org/jira/browse/DRILL-1499
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Jinfeng Ni
>            Assignee: Jinfeng Ni
>             Fix For: 0.7.0
>
>
> For a select * query referring to a schema-less table, Drill could return 
> different column, depending on the physical operators the query involves:
> Q1:
> {code}
> select * from cp.`employee.json` limit 3;
> +-------------+------------+------------+------------+-------------+----------------+------------+---------------+------------+------------+------------+---------------+-----------------+----------------+------------+-----------------+
> | employee_id | full_name  | first_name | last_name  | position_id | 
> position_title |  store_id  | department_id | birth_date | hire_date  |   
> salary   | supervisor_id | education_level | marital_status |   gender   | 
> management_role |
> +-------------+------------+------------+------------+-------------+----------------+------------+---------------+------------+------------+------------+---------------+-----------------+----------------+------------+-----------------+
> {code}
> Q2:
> {code}
> select * from cp.`employee.json` order by last_name limit 3;
> +------------+---------------+-----------------+-------------+------------+------------+------------+------------+------------+-----------------+----------------+-------------+----------------+------------+------------+---------------+
> | birth_date | department_id | education_level | employee_id | first_name | 
> full_name  |   gender   | hire_date  | last_name  | management_role | 
> marital_status | position_id | position_title |   salary   |  store_id  | 
> supervisor_id |
> +------------+---------------+-----------------+-------------+------------+------------+------------+------------+------------+-----------------+----------------+-------------+----------------+------------+------------+---------------+
> {code}
> The difference between Q1 and Q2 is the order by clause.  With order by 
> clause in Q2, Drill will sort the column names alphabetically, while for Q1, 
> the column names are in the same order as in the data source. 
> The underlying cause for such difference is that the sort or sort-based 
> merger operator would require canonicalization, since the incoming batches 
> could contain different schemas. 
>  However, it would be better that such canonicalization is used only when the 
> incoming batches have changing schemas. If all the incoming batches have 
> identical schemas, no need to sort the column orders.  With this fix, Drill 
> will present the same column order in the result set, for a schema-less 
> select * query,  if there is no changing schemas from incoming data sources. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-1499) Different column order could appear in the result set for a schema-less select * query, even there are no changing schemas.

Reply via email to