[jira] [Commented] (PARQUET-723) parquet is not storing the type for the column.

Julien Le Dem (JIRA) Wed, 26 Oct 2016 11:32:35 -0700

    [ 
https://issues.apache.org/jira/browse/PARQUET-723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15609272#comment-15609272
 ]


Julien Le Dem commented on PARQUET-723:
---------------------------------------

It looks like a bug/missing feature in Hive.
[~spena] What do you think?

> parquet is not storing the type for the column.
> -----------------------------------------------
>
>                 Key: PARQUET-723
>                 URL: https://issues.apache.org/jira/browse/PARQUET-723
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-format
>            Reporter: Narasimha
>
> 1. Create Text file format table 
>       CREATE EXTERNAL TABLE IF NOT EXISTS emp(
>       id INT,
>       first_name STRING,
>       last_name STRING,
>       dateofBirth STRING,
>       join_date INT
>       )
>       COMMENT 'This is Employee Table Date Of Birth of type String'
>       ROW FORMAT DELIMITED
>       FIELDS TERMINATED BY ','
>       LINES TERMINATED BY '\n'
>       STORED AS TEXTFILE
>       LOCATION '/user/employee/beforePartition';
> 2. Load the data into table
>       load data inpath '/user/somupoc_timestamp/employeeData_partitioned.csv' 
> into table emp;
>       select * from emp;
> 3. Create Partitioned table with file format as Parquet (dateofBirth STRING))
>       create external table emp_afterpartition(
>       id int, first_name STRING, last_name STRING, dateofBirth STRING)
>       COMMENT 'Employee partitioned table with dateofBirth of type string'
>       partitioned by (join_date int)
>       STORED as parquet
>       LOCATION '/user/employee/afterpartition';
> 4.  Fetch the data from Partitioned column
>       set hive.exec.dynamic.partition=true;  
>       set hive.exec.dynamic.partition.mode=nonstrict; 
>       insert overwrite table emp_afterpartition partition (join_date) select 
> * from emp;
>       select * from emp_afterpartition;
> 5. Create Partitioned table with file format as Parquet (dateofBirth 
> TIMESTAMP))
>       CREATE EXTERNAL TABLE IF NOT EXISTS 
> employee_afterpartition_timestamp_parq(
>       id INT,first_name STRING,last_name STRING,dateofBirth TIMESTAMP)
>       COMMENT 'employee partitioned table with dateofBirth of type TIMESTAMP'
>       PARTITIONED BY (join_date INT)
>       STORED AS PARQUET
>       LOCATION '/user/employee/afterpartition';
>       select * from employee_afterpartition_timestamp_parq;
>         -- 0 records returned
>       impala ::       alter table employee_afterpartition_timestamp_parq 
> RECOVER PARTITIONS;
>       Hive ::         MSCK REPAIR TABLE 
> employee_afterpartition_timestamp_parq;
>       -- MSCK works in Hive and  RECOVER PARTITIONS works in Impala -- 
> metastore check command with the repair table option:
>       select * from employee_afterpartition_timestamp_parq;
> Actual Result :: Failed with exception 
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
> org.apache.hadoop.hive.serde2.io.TimestampWritable
> Expected Result :: Data should display
> Note: if file format is text file instead of Parquet then I am able to fetch 
> the data.
> Observation : Two tables having different column type pointing to same 
> location(HDFS ).
> sample Data
> =========
> 1,Joyce,Garza,2016-07-17 14:42:18,201607
> 2,Jerry,Ortiz,2016-08-17 21:36:54,201608
> 3,Steven,Ryan,2016-09-10 01:32:40,201609
> 4,Lisa,Black,2015-10-12 15:05:13,201610
> 5,Jose,Turner,2015-011-10 06:38:40,201611
> 6,Joyce,Garza,2016-08-02,201608
> 7,Jerry,Ortiz,2016-01-01,201601
> 8,Steven,Ryan,2016/08/20,201608
> 9,Lisa,Black,2016/09/12,201609
> 10,Jose,Turner,09/19/2016,201609
> 11,Jose,Turner,20160915,201609



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PARQUET-723) parquet is not storing the type for the column.

Reply via email to