[
https://issues.apache.org/jira/browse/PARQUET-155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258425#comment-14258425
]
Ryan Blue commented on PARQUET-155:
-----------------------------------
Hive [doesn't currently implement union types in
parquet|https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java#L110],
so we would have to plan and implement it to do this in Hive.
Parquet-avro does support mapping union types to Parquet by replacing the union
with a group where each possible type in the union is a field. Then only one
field in that group is defined at a time. What you could try is to write a
simple map-only job that reads with Avro and writes with Parquet-avro. Then you
wouldn't have to do any conversion. (Also, you should be able to do this
without any code using the Kite CLI copy command)
> Hive Avro to Parquet table conversion
> -------------------------------------
>
> Key: PARQUET-155
> URL: https://issues.apache.org/jira/browse/PARQUET-155
> Project: Parquet
> Issue Type: Bug
> Reporter: Dmitriy
>
> Hi.
> I have following avro schema
> {code}
> {
> "namespace" : "com.example.test",
> "type" : "record",
> "name" : "TestRecord",
> "fields" : [{"name" : "objectLink", "type" : [
> {"type": "record", "name" : "TestObj1", "fields" :
> [{"name":"obj1VisitorId","type":["null","string"]}] },
> {"type": "record", "name" : "TestObj2", "fields" :
> [{"name":"obj2VisitorId","type":["null","string"]}]}
> ]
> }],
> "doc" : "event for test purposes"
> }
> {code}
> Using this schema I can create avro objects, also I'm able to create table
> backed by avro in Hive. But then I want to create a table backed by parquet
> I'm doing
> CREATE TABLE parquet_table
> STORED AS parquet
> AS SELECT * FROM avro_table
> and i get
> SemanticException java.lang.UnsupportedOperationException: Unknown field
> type: uniontype<struct<obj1visitorid:string>,struct<obj2visitorid:string>>
> Is there a way to convert such structures, to store them in hive backed as
> parquet? This is a simple example, but I have big data structure described in
> avro, so I can't convert it manually, and also I have data which already
> stored in avro and need to be loaded in table, backed by parquet. Is there
> any way to this?
> I'm using hive 0.13.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)