[ 
https://issues.apache.org/jira/browse/PARQUET-155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14260392#comment-14260392
 ] 

Ryan Blue commented on PARQUET-155:
-----------------------------------

bq. If I'll do this, then I need to define parquet backed table by myself, am I 
right?

Yes, you would until HIVE-8950 is merged and released.

In the mean time, if you already have an Avro table (with some minor 
restrictions), you can use Kite to get around the problem. Kite is a layer 
around Parquet and Avro that manages a collection of files as a dataset, 
similar to what Hive does. Kite can inspect an Avro Hive table and give you the 
schema, then you can use it to create a Parquet table with that schema. 
Finally, you can copy from one to the other:

{code}
kite-dataset schema avro_table --output hdfs:/user/me/schemas/table.avsc
kite-dataset create parquet_table --schema hdfs:/user/me/schemas/table.avsc
kite-dataset copy avro_table parquet_table
{code}

You can find more information on Kite at 
[kitesdk.org|http://kitesdk.org/docs/current/] and find more help on the Kite 
mailing list.

> Hive Avro to Parquet table conversion
> -------------------------------------
>
>                 Key: PARQUET-155
>                 URL: https://issues.apache.org/jira/browse/PARQUET-155
>             Project: Parquet
>          Issue Type: Bug
>            Reporter: Dmitriy
>
> Hi.
> I have following avro schema 
> {code}
> {
>      "namespace" : "com.example.test",
>      "type" : "record",
>      "name" : "TestRecord",
>      "fields" : [{"name" : "objectLink", "type" : [
>                            {"type": "record", "name" : "TestObj1", "fields" : 
> [{"name":"obj1VisitorId","type":["null","string"]}] },
>                            {"type": "record", "name" : "TestObj2", "fields" : 
> [{"name":"obj2VisitorId","type":["null","string"]}]}
>                        ]
>                  }],
>      "doc" : "event for test purposes"
> }
> {code}
> Using this schema I can create avro objects, also I'm able to create table 
> backed by avro in Hive. But then I want to create a table backed by parquet 
> I'm doing 
> CREATE TABLE parquet_table 
> STORED AS parquet
> AS SELECT * FROM avro_table
> and i get 
> SemanticException java.lang.UnsupportedOperationException: Unknown field 
> type: uniontype<struct<obj1visitorid:string>,struct<obj2visitorid:string>>
> Is there a way to convert such structures, to store them in hive backed as 
> parquet? This is a simple example, but I have big data structure described in 
> avro, so I can't convert it manually, and also I have data which already 
> stored in avro and need to be loaded in table, backed by parquet. Is there 
> any way to this?
> I'm using hive 0.13.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to