[
https://issues.apache.org/jira/browse/DRILL-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370069#comment-14370069
]
Jinfeng Ni commented on DRILL-2458:
-----------------------------------
The extra column added seems not to be related to the code of LocalExchange; it
could happen to any other regular column. I could re-produce similar issue as
following.
{code}
create table dfs.tmp.region_sort as select *, r_regionkey + 1 from
cp.`tpch/region.parquet` order by r_name;
+------------+---------------------------+
| Fragment | Number of records written |
+------------+---------------------------+
| 0_0 | 5 |
+------------+---------------------------+
1 row selected (2.556 seconds)
{code}
Then, run query against the created parquet file.
{code}
0: jdbc:drill:zk=local> select * from dfs.tmp.region_sort;
+-------------+------------+------------+------------+------------+------------+
| r_regionkey | r_name | r_comment | EXPR$1 | r_name0 | EXPR$10 |
+-------------+------------+------------+------------+------------+------------+
| 0 | AFRICA | lar deposits. blithely final packages cajole.
regular waters are final requests. regular accounts are according to | 1
| AFRICA | 1 |
| 1 | AMERICA | hs use ironic, even requests. s | 2 |
AMERICA | 2 |
| 2 | ASIA | ges. thinly even pinto beans ca | 3 |
ASIA | 3 |
| 3 | EUROPE | ly final courts cajole furiously final excuse | 4
| EUROPE | 4 |
| 4 | MIDDLE EAST | uickly special accounts cajole carefully blithely
close requests. carefully final asymptotes haggle furiousl | 5 |
MIDDLE EAST | 5 |
+-------------+------------+------------+------------+------------+------------+
5 rows selected (0.108 seconds)
{code}
We can see two additional columns were added to the output parquet files :
r_name0 | EXPR$10.
The root cause of this problem is in the star column handling, where we should
add prefix to the star, so that execution operator would be able to distinguish
the regular columns expanded from the star column, from the column/expression
referenced in the query.
> Extra hash column added when running CTAS with order by
> -------------------------------------------------------
>
> Key: DRILL-2458
> URL: https://issues.apache.org/jira/browse/DRILL-2458
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Steven Phillips
> Assignee: Jinfeng Ni
>
> I created the table via the command:
> {code:sql}
> create table dfs.tmp.lineitem_sort as select * from
> dfs.`/drill/SF10/lineitem` order by l_extendedprice desc
> {code}
> This resulted in an extra column when reading the data back:
> {code}
> 0: jdbc:drill:> select * from `lineitem_sort/0_0_0.parquet` limit 1;
> +---------------------------+------------+--------------+------------+-----------------+--------------+--------------+------------+------------+------------+---------------+--------------+------------+----------------+------------+------------+------------+------------------+
> | E_X_P_R_H_A_S_H_F_I_E_L_D | L_COMMENT | L_COMMITDATE | L_DISCOUNT |
> L_EXTENDEDPRICE | L_LINENUMBER | L_LINESTATUS | L_ORDERKEY | L_PARTKEY |
> L_QUANTITY | L_RECEIPTDATE | L_RETURNFLAG | L_SHIPDATE | L_SHIPINSTRUCT |
> L_SHIPMODE | L_SUPPKEY | L_TAX | l_extendedprice0 |
> +---------------------------+------------+--------------+------------+-----------------+--------------+--------------+------------+------------+------------+---------------+--------------+------------+----------------+------------+------------+------------+------------------+
> | -1909175176 | [B@187a06b6 | [B@734ea347 | 0.02 |
> 104949.5 | 2 | [B@2fc1c575 | 16734176 | 199999 |
> 50.0 | [B@5a8a9cd1 | [B@423d8bc7 | [B@56a3d7ca | [B@1eac3b36 |
> [B@3d6365f5 | 50002 | 0.05 | 104949.5 |
> +---------------------------+------------+--------------+------------+-----------------+--------------+--------------+------------+------------+------------+---------------+--------------+------------+----------------+------------+------------+------------+------------------+
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)