Jinfeng Ni created DRILL-4392:
---------------------------------
Summary: CTAS with partition writes an internal field into
generated parquet files
Key: DRILL-4392
URL: https://issues.apache.org/jira/browse/DRILL-4392
Project: Apache Drill
Issue Type: Bug
Reporter: Jinfeng Ni
Priority: Blocker
On today's master branch:
{code}
select * from sys.version;
+-----------------+-------------------------------------------+---------------------------------------------------------------------+----------------------------+-----------------+----------------------------+
| version | commit_id |
commit_message | commit_time
| build_email | build_time |
+-----------------+-------------------------------------------+---------------------------------------------------------------------+----------------------------+-----------------+----------------------------+
| 1.5.0-SNAPSHOT | 9a3a5c4ff670a50a49f61f97dd838da59a12f976 | DRILL-4382:
Remove dependency on drill-logical from vector package | 16.02.2016 @ 11:58:48
PST | [email protected] | 16.02.2016 @ 17:40:44 PST |
+-----------------+-------------------------------------------+---------------------------------------------------------------------+----------------------------+-----------------
{code}
Parquet table created by Drill's CTAS statement has one internal field
"P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R". This additional field would not
impact non-star query, but would cause incorrect result for star query.
{code}
use dfs.tmp;
create table nation_ctas partition by (n_regionkey) as select * from
cp.`tpch/nation.parquet`;
select * from dfs.tmp.nation_ctas limit 6;
+--------------+----------------+--------------+-----------------------------------------------------------------------------------------------------------------+----------------------------------------+
| n_nationkey | n_name | n_regionkey |
n_comment
| P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R |
+--------------+----------------+--------------+-----------------------------------------------------------------------------------------------------------------+----------------------------------------+
| 5 | ETHIOPIA | 0 | ven packages wake quickly.
regu
| true |
| 15 | MOROCCO | 0 | rns. blithely bold courts
among the closely regular packages use furiously bold platelets?
| false |
| 14 | KENYA | 0 | pending excuses haggle
furiously deposits. pending, express pinto beans wake fluffily past t
| false |
| 0 | ALGERIA | 0 | haggle. carefully final
deposits detect slyly agai
| false |
| 16 | MOZAMBIQUE | 0 | s. ironic, unusual asymptotes
wake blithely r
| false |
| 24 | UNITED STATES | 1 | y final packages. slow foxes
cajole quickly. quickly silent platelets breach ironic accounts. unusual pinto
be | true
{code}
This basically breaks all the parquet files created by Drill's CTAS with
partition support.
Also, it will also fail one of the Pre-commit functional test [1]
[1]
https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/drill3361.q
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)