Hello All, I am trying to load a table in ORC format with data coming from another hive table stored in text format. Below is the hive query I am trying. both the tables are partitioned on data_date.
insert overwrite table target_table partition(data_date='2015-09-30') select col1, col2, cast(col3 as date), cast(col4 as date) from stage_table where data_date='2015-09-30' clustered by col1; The query runs fine and loads the data with no issues. When trying to read the data after the insertion, it throws the following error. Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.DateWritable cannot be cast to org.apache.hadoop.io.Text. A simple select statement select col1, col2, cast(col3 as date), cast(col4 as date) from stage_table where data_date='2015-09-30' is producing the records in date format. Does anybody has experienced the same issue earlier? any pointers to debug this issue? The structure of stage and target table are as follows. create external table stage_table ( col1 int, col2 string, col3 timestamp, col4 timestamp) partitioned by (data_date date) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'maprfs:/dir1/xyz/stage/stage_table' TBLPROPERTIES ( 'serialization.null.format'='', 'skip.header.line.count'='1'); create external table target_table ( col1 int, col2 string, col3 date, col4 date) partitioned by (data_date date) CLUSTERED BY ( col1) SORTED BY ( col1 ASC) INTO 64 BUCKETS ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'maprfs:/dir1/xyz/orc/target_table' TBLPROPERTIES ( 'serialization.null.format'='');