Hello All,

I am trying to load a table in ORC format with data coming from another hive 
table stored in text format. Below is the hive query I am trying. both the 
tables are partitioned on data_date.

insert overwrite table target_table partition(data_date='2015-09-30')
select col1, col2, cast(col3 as date), cast(col4 as date) from stage_table 
where data_date='2015-09-30' clustered by col1;

The query runs fine and loads the data with no issues. When trying to read the 
data after the insertion, it throws the following error.

Failed with exception java.io.IOException:java.lang.ClassCastException: 
org.apache.hadoop.hive.serde2.io.DateWritable cannot be cast to 
org.apache.hadoop.io.Text.

A simple select statement select col1, col2, cast(col3 as date), cast(col4 as 
date) from stage_table where data_date='2015-09-30' is producing the records in 
date format.  Does anybody has experienced the same issue earlier? any pointers 
to debug this issue?

The structure of stage and target table are as follows.

create external table stage_table (
col1 int,
col2 string,
col3 timestamp,
col4 timestamp)
partitioned by (data_date date)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'maprfs:/dir1/xyz/stage/stage_table'
TBLPROPERTIES (
'serialization.null.format'='',
'skip.header.line.count'='1');

create external table target_table (
col1 int,
col2 string,
col3 date,
col4 date)
partitioned by (data_date date)
CLUSTERED BY (
col1)
SORTED BY (
col1 ASC)
INTO 64 BUCKETS
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'maprfs:/dir1/xyz/orc/target_table'
TBLPROPERTIES (
'serialization.null.format'='');

Reply via email to