[
https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378929#comment-15378929
]
Yibing Shi commented on HIVE-14205:
-----------------------------------
Just found that current Hive union type implementation has an essential
confliction with AVRO implementation.
Currently Hive uses {{UnionObject}} as the value of union type columns. For
example, if we create a table like below:
{noformat}
create table avro_union_test2 (value uniontype<int,bigint>);
{noformat}
We cannot just stored int or bigint data to column "value". Instead, we will
have to use UDF create_union to create a {{UnionObject}} value:
{noformat}
insert overwrite table avro_union_test2 select 1 as value; -- this fails
insert overwrite table avro_union_test2 select create_union(0,1,2L) as value;
-- this succeeds
{noformat}
If the table uses text file format, the data stored in file is as below:
{noformat}
0:1
{noformat}
where the 0 is the tag/offset of the object, and 1 is the actual value.
(the 2L part is used only for type checking and isn't stored in data file at
all)
AvroSerDe stores data in a similar way. It stores the type offset together with
the actual data. But when reading data, avro returns the actual data instead of
a {{UnionObject}}:
https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java#L179
For above data created by {{create_union}}, the AvroSerDe returns an Integer
instead of a UnionObject. This makes Hive fail in future operations (writing to
data files or formatting as Json string).
I will check to see whether we have a way to fix this.
> Hive doesn't support union type with AVRO file format
> -----------------------------------------------------
>
> Key: HIVE-14205
> URL: https://issues.apache.org/jira/browse/HIVE-14205
> Project: Hive
> Issue Type: Bug
> Components: Serializers/Deserializers
> Reporter: Yibing Shi
> Assignee: Yibing Shi
> Attachments: HIVE-14205.1.patch
>
>
> Reproduce steps:
> {noformat}
> hive> CREATE TABLE avro_union_test
> > PARTITIONED BY (p int)
> > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> > STORED AS INPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> > OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> > TBLPROPERTIES ('avro.schema.literal'='{
> > "type":"record",
> > "name":"nullUnionTest",
> > "fields":[
> > {
> > "name":"value",
> > "type":[
> > "null",
> > "int",
> > "long"
> > ],
> > "default":null
> > }
> > ]
> > }');
> OK
> Time taken: 0.105 seconds
> hive> alter table avro_union_test add partition (p=1);
> OK
> Time taken: 0.093 seconds
> hive> select * from avro_union_test;
> FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException:
> Failed with exception Hive internal error inside
> isAssignableFromSettablePrimitiveOI void not supported
> yet.java.lang.RuntimeException: Hive internal error inside
> isAssignableFromSettablePrimitiveOI void not supported yet.
> at
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140)
> at
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149)
> at
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187)
> at
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220)
> at
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200)
> at
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219)
> at
> org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581)
> at
> org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172)
> at
> org.apache.hadoop.hive.ql.exec.FetchOperator.<init>(FetchOperator.java:140)
> at
> org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108)
> at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381)
> at
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {noformat}
> Another test case to show this problem is:
> {noformat}
> hive> create table avro_union_test2 (value uniontype<int,bigint>) stored as
> avro;
> OK
> Time taken: 0.053 seconds
> hive> show create table avro_union_test2;
> OK
> CREATE TABLE `avro_union_test2`(
> `value` uniontype<void,int,bigint> COMMENT '')
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> STORED AS INPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> LOCATION
> 'hdfs://localhost/user/hive/warehouse/avro_union_test2'
> TBLPROPERTIES (
> 'transient_lastDdlTime'='1468173589')
> Time taken: 0.051 seconds, Fetched: 12 row(s)
> {noformat}
> Although column {{value}} is defined as {{uniontype<int,bigint>}} in create
> table command, its type becomes {{uniontype<void,int,bigint>}} after table is
> defined. Hive accidentally make the nullable definition in avro schema
> ({{\["null", "int", "long"\]}}) into union definition.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)