[jira] [Commented] (HIVE-7847) query orc partitioned table fail when table column type change

David Morel (JIRA) Tue, 13 Oct 2015 01:35:25 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954622#comment-14954622
 ]


David Morel commented on HIVE-7847:
-----------------------------------

I encountered a similar(?) issue yesterday, which is tightly linked with the 
use of INSERT OVERWRITE.
1. create ORC partitioned table with datatype X for a column
2. insert into the partition
3. ALTER TABLE change to datatype Y
4. INSERT OVERWRITE in partition
5. Query fails with Error X cannot be cast to Y (or the other way around, 
doesn't matter)
This is because, as I checked, the datatype is kept in the hive metastore for 
the specific column and partition; when you insert overwrite (with the new 
datatype) it doesn't update that information in the metastore. So you insert a 
column of datatype Y (since the table definition now says so), which at query 
time hive thinks is of type X. And it breaks. Manually fixing the column type 
in the metastore DB fixes it.

> query orc partitioned table fail when table column type change
> --------------------------------------------------------------
>
>                 Key: HIVE-7847
>                 URL: https://issues.apache.org/jira/browse/HIVE-7847
>             Project: Hive
>          Issue Type: Bug
>          Components: File Formats
>    Affects Versions: 0.11.0, 0.12.0, 0.13.0
>            Reporter: Zhichun Wu
>            Assignee: Zhichun Wu
>         Attachments: HIVE-7847.1.patch, vector_alter_partition_change_col.q
>
>
> I use the following script to test orc column type change with partitioned 
> table on branch-0.13:
> {code}
> use test;
> DROP TABLE if exists orc_change_type_staging;
> DROP TABLE if exists orc_change_type;
> CREATE TABLE orc_change_type_staging (
>     id int
> );
> CREATE TABLE orc_change_type (
>     id int
> ) PARTITIONED BY (`dt` string)
> stored as orc;
> --- load staging table
> LOAD DATA LOCAL INPATH '../hive/examples/files/int.txt' OVERWRITE INTO TABLE 
> orc_change_type_staging;
> --- populate orc hive table
> INSERT OVERWRITE TABLE orc_change_type partition(dt='20140718') select * FROM 
> orc_change_type_staging limit 1;
> --- change column id from int to bigint
> ALTER TABLE orc_change_type CHANGE id id bigint;
> INSERT OVERWRITE TABLE orc_change_type partition(dt='20140719') select * FROM 
> orc_change_type_staging limit 1;
> SELECT id FROM orc_change_type where dt between '20140718' and '20140719';
> {code}
> if fails in the last query "SELECT id FROM orc_change_type where dt between 
> '20140718' and '20140719';" with exception:
> {code}
> Error: java.io.IOException: java.io.IOException: 
> java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast 
> to org.apache.hadoop.io.LongWritable
>         at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>         at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>         at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:256)
>         at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:171)
>         at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:197)
>         at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:183)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
> Caused by: java.io.IOException: java.lang.ClassCastException: 
> org.apache.hadoop.io.IntWritable cannot be cast to 
> org.apache.hadoop.io.LongWritable
>         at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>         at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>         at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:344)
>         at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
>         at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
>         at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:122)
>         at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:254)
>         ... 11 more
> Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable 
> cannot be cast to org.apache.hadoop.io.LongWritable
>         at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:717)
>         at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1788)
>         at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2997)
>         at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:153)
>         at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:127)
>         at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:339)
>         ... 15 more
> {code}
> The value object would be reused each time we deserialize the row,  it will 
> fail when we start to process the next path with different schema.  Resetting 
> value each time we finish reading one path would solve this problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7847) query orc partitioned table fail when table column type change

Reply via email to