[ https://issues.apache.org/jira/browse/HIVE-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13656384#comment-13656384 ]
Sushanth Sowmyan commented on HIVE-4551: ---------------------------------------- The problem here is that the raw data encapsulated by HCatRecord and HCatSchema are out of synch, which was one of my worries back in HCATALOG-425 : https://issues.apache.org/jira/browse/HCATALOG-425?focusedCommentId=13439652&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13439652 Basically, the raw data contained in the smallint/tinyint columns are raw shorts and bytes, and we try to read it as an Int. In the case of rcfile, the underlying raw data is also stored as an IntWritable in the cases of smallint and tinyint, but not so in the case of orc. This leads to the following kind of calls in the rcfile case, and in the orc case: RCFILE: {noformat} 13/05/11 02:56:10 INFO mapreduce.InternalUtil: Initializing org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe with properties {transient_lastDdlTime=1368266162, serialization.null.format=\N, columns=ti,si,i,bi,f,d,b, serialization.format=1, columns.types=int,int,int,bigint,float,double,boolean} ==> org.apache.hadoop.hive.serde2.lazy.LazyInteger:-3 ==> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int ==> org.apache.hadoop.hive.serde2.lazy.LazyInteger:9001 ==> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int ==> org.apache.hadoop.hive.serde2.lazy.LazyInteger:86400 ==> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int ==> org.apache.hadoop.hive.serde2.lazy.LazyLong:4294967297 ==> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyLongObjectInspector:bigint ==> org.apache.hadoop.hive.serde2.lazy.LazyFloat:34.532 ==> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyFloatObjectInspector:float ==> org.apache.hadoop.hive.serde2.lazy.LazyDouble:2.184239842983489E15 ==> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyDoubleObjectInspector:double ==> org.apache.hadoop.hive.serde2.lazy.LazyBoolean:true ==> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyBooleanObjectInspector:boolean ==> org.apache.hadoop.hive.serde2.lazy.LazyInteger:0 ==> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int ==> org.apache.hadoop.hive.serde2.lazy.LazyInteger:0 ==> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int ==> org.apache.hadoop.hive.serde2.lazy.LazyInteger:0 ==> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int ==> org.apache.hadoop.hive.serde2.lazy.LazyLong:0 ==> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyLongObjectInspector:bigint ==> org.apache.hadoop.hive.serde2.lazy.LazyFloat:0.0 ==> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyFloatObjectInspector:float ==> org.apache.hadoop.hive.serde2.lazy.LazyDouble:0.0 ==> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyDoubleObjectInspector:double ==> org.apache.hadoop.hive.serde2.lazy.LazyBoolean:false ==> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyBooleanObjectInspector:boolean {noformat} ORC: {noformat} 13/05/11 02:56:16 INFO mapreduce.InternalUtil: Initializing org.apache.hadoop.hive.ql.io.orc.OrcSerde with properties {transient_lastDdlTime=1368266162, serialization.null.format=\N, columns=ti,si,i,bi,f,d,b, serialization.format=1, columns.types=int,int,int,bigint,float,double,boolean} ==> org.apache.hadoop.hive.serde2.io.ByteWritable:-3 ==> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector:int 13/05/11 02:56:16 WARN mapred.LocalJobRunner: job_local_0003 org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error converting read value to tuple at org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76) at org.apache.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:53) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:194) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.ByteWritable cannot be cast to org.apache.hadoop.io.IntWritable at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.getPrimitiveJavaObject(WritableIntObjectInspector.java:45) at org.apache.hcatalog.data.HCatRecordSerDe.serializePrimitiveField(HCatRecordSerDe.java:292) at org.apache.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:192) at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53) at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97) at org.apache.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:203) at org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:63) ... 8 more {noformat} (There is also an additional bug in how they are read for promotion, assuming Byte where it's ByteWritable, etc) > ORC - HCatLoader integration has issues with smallint/tinyint promotions to > Int > ------------------------------------------------------------------------------- > > Key: HIVE-4551 > URL: https://issues.apache.org/jira/browse/HIVE-4551 > Project: Hive > Issue Type: Bug > Components: HCatalog > Reporter: Sushanth Sowmyan > Assignee: Sushanth Sowmyan > > This was initially reported from an e2e test run, with the following E2E test: > {code} > { > 'name' => 'Hadoop_ORC_Write', > 'tests' => [ > { > 'num' => 1 > ,'hcat_prep'=>q\ > drop table if exists hadoop_orc; > create table hadoop_orc ( > t tinyint, > si smallint, > i int, > b bigint, > f float, > d double, > s string) > stored as orc;\ > ,'hadoop' => q\ > jar :FUNCPATH:/testudf.jar org.apache.hcatalog.utils.WriteText -libjars > :HCAT_JAR: :THRIFTSERVER: all100k hadoop_orc\, > ,'result_table' => 'hadoop_orc' > ,'sql' => q\select * from all100k;\ > ,'floatpostprocess' => 1 > ,'delimiter' => ' ' > }, > ], > }, > {code} > This fails with the following error: > {code} > 2013-04-26 00:26:07,437 WARN org.apache.hadoop.mapred.Child: Error running > child > org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error > converting read value to tuple > at > org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76) > at org.apache.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:53) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) > at > org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1195) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.serde2.io.ByteWritable cannot be cast to > org.apache.hadoop.io.IntWritable > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.getPrimitiveJavaObject(WritableIntObjectInspector.java:45) > at > org.apache.hcatalog.data.HCatRecordSerDe.serializePrimitiveField(HCatRecordSerDe.java:290) > at > org.apache.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:192) > at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53) > at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97) > at > org.apache.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:203) > at > org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:63) > ... 12 more > 2013-04-26 00:26:07,440 INFO org.apache.hadoop.mapred.Task: Runnning cleanup > for the task > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira