Jinfeng Ni created DRILL-4392:
---------------------------------

             Summary: CTAS with partition writes an internal field into 
generated parquet files
                 Key: DRILL-4392
                 URL: https://issues.apache.org/jira/browse/DRILL-4392
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Jinfeng Ni
            Priority: Blocker


On today's master branch:

{code}
select * from sys.version;
+-----------------+-------------------------------------------+---------------------------------------------------------------------+----------------------------+-----------------+----------------------------+
|     version     |                 commit_id                 |                 
          commit_message                            |        commit_time        
 |   build_email   |         build_time         |
+-----------------+-------------------------------------------+---------------------------------------------------------------------+----------------------------+-----------------+----------------------------+
| 1.5.0-SNAPSHOT  | 9a3a5c4ff670a50a49f61f97dd838da59a12f976  | DRILL-4382: 
Remove dependency on drill-logical from vector package  | 16.02.2016 @ 11:58:48 
PST  | [email protected]  | 16.02.2016 @ 17:40:44 PST  |
+-----------------+-------------------------------------------+---------------------------------------------------------------------+----------------------------+-----------------
{code}

Parquet table created by Drill's CTAS statement has one internal field 
"P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R".   This additional field would not 
impact non-star query, but would cause incorrect result for star query.

{code}
use dfs.tmp;

create table nation_ctas partition by (n_regionkey) as select * from 
cp.`tpch/nation.parquet`;

select * from dfs.tmp.nation_ctas limit 6;
+--------------+----------------+--------------+-----------------------------------------------------------------------------------------------------------------+----------------------------------------+
| n_nationkey  |     n_name     | n_regionkey  |                                
                    n_comment                                                   
 | P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R  |
+--------------+----------------+--------------+-----------------------------------------------------------------------------------------------------------------+----------------------------------------+
| 5            | ETHIOPIA       | 0            | ven packages wake quickly. 
regu                                                                            
     | true                                   |
| 15           | MOROCCO        | 0            | rns. blithely bold courts 
among the closely regular packages use furiously bold platelets?                
      | false                                  |
| 14           | KENYA          | 0            |  pending excuses haggle 
furiously deposits. pending, express pinto beans wake fluffily past t           
        | false                                  |
| 0            | ALGERIA        | 0            |  haggle. carefully final 
deposits detect slyly agai                                                      
       | false                                  |
| 16           | MOZAMBIQUE     | 0            | s. ironic, unusual asymptotes 
wake blithely r                                                                 
  | false                                  |
| 24           | UNITED STATES  | 1            | y final packages. slow foxes 
cajole quickly. quickly silent platelets breach ironic accounts. unusual pinto 
be  | true
{code}

This basically breaks all the parquet files created by Drill's CTAS with 
partition support. 

Also, it will also fail one of the Pre-commit functional test [1]

[1] 
https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/drill3361.q




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to