[ https://issues.apache.org/jira/browse/PARQUET-723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15609272#comment-15609272 ]
Julien Le Dem commented on PARQUET-723: --------------------------------------- It looks like a bug/missing feature in Hive. [~spena] What do you think? > parquet is not storing the type for the column. > ----------------------------------------------- > > Key: PARQUET-723 > URL: https://issues.apache.org/jira/browse/PARQUET-723 > Project: Parquet > Issue Type: Bug > Components: parquet-format > Reporter: Narasimha > > 1. Create Text file format table > CREATE EXTERNAL TABLE IF NOT EXISTS emp( > id INT, > first_name STRING, > last_name STRING, > dateofBirth STRING, > join_date INT > ) > COMMENT 'This is Employee Table Date Of Birth of type String' > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > LINES TERMINATED BY '\n' > STORED AS TEXTFILE > LOCATION '/user/employee/beforePartition'; > 2. Load the data into table > load data inpath '/user/somupoc_timestamp/employeeData_partitioned.csv' > into table emp; > select * from emp; > 3. Create Partitioned table with file format as Parquet (dateofBirth STRING)) > create external table emp_afterpartition( > id int, first_name STRING, last_name STRING, dateofBirth STRING) > COMMENT 'Employee partitioned table with dateofBirth of type string' > partitioned by (join_date int) > STORED as parquet > LOCATION '/user/employee/afterpartition'; > 4. Fetch the data from Partitioned column > set hive.exec.dynamic.partition=true; > set hive.exec.dynamic.partition.mode=nonstrict; > insert overwrite table emp_afterpartition partition (join_date) select > * from emp; > select * from emp_afterpartition; > 5. Create Partitioned table with file format as Parquet (dateofBirth > TIMESTAMP)) > CREATE EXTERNAL TABLE IF NOT EXISTS > employee_afterpartition_timestamp_parq( > id INT,first_name STRING,last_name STRING,dateofBirth TIMESTAMP) > COMMENT 'employee partitioned table with dateofBirth of type TIMESTAMP' > PARTITIONED BY (join_date INT) > STORED AS PARQUET > LOCATION '/user/employee/afterpartition'; > select * from employee_afterpartition_timestamp_parq; > -- 0 records returned > impala :: alter table employee_afterpartition_timestamp_parq > RECOVER PARTITIONS; > Hive :: MSCK REPAIR TABLE > employee_afterpartition_timestamp_parq; > -- MSCK works in Hive and RECOVER PARTITIONS works in Impala -- > metastore check command with the repair table option: > select * from employee_afterpartition_timestamp_parq; > Actual Result :: Failed with exception > java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to > org.apache.hadoop.hive.serde2.io.TimestampWritable > Expected Result :: Data should display > Note: if file format is text file instead of Parquet then I am able to fetch > the data. > Observation : Two tables having different column type pointing to same > location(HDFS ). > sample Data > ========= > 1,Joyce,Garza,2016-07-17 14:42:18,201607 > 2,Jerry,Ortiz,2016-08-17 21:36:54,201608 > 3,Steven,Ryan,2016-09-10 01:32:40,201609 > 4,Lisa,Black,2015-10-12 15:05:13,201610 > 5,Jose,Turner,2015-011-10 06:38:40,201611 > 6,Joyce,Garza,2016-08-02,201608 > 7,Jerry,Ortiz,2016-01-01,201601 > 8,Steven,Ryan,2016/08/20,201608 > 9,Lisa,Black,2016/09/12,201609 > 10,Jose,Turner,09/19/2016,201609 > 11,Jose,Turner,20160915,201609 -- This message was sent by Atlassian JIRA (v6.3.4#6332)