Hi Folks

Any suggestions or thoughts on the question / issue posted below ?

Regards
Srinivas

On 2018/09/19 10:47:38, Srinivas M <s...@gmail.com> wrote:
> Hi>
>
> We have a java application which writes parquet files. We are using the>
> Parquet 1.9.0 API to write the Timestamp data. Since there are>
> incompatibilities between the Parquet and Hive representation of the>
> Timestamp data, we have tried to work around the same by writing the>
> Parquet Timestamp data as 12 byte array by converting the Timestamp
fields>
> in the format Hive expects. However, while setting the field type in the>
> Schema, since Avro Schema Types does not have an enumeration for the
INT96>
> type, we have set it to bytes under the assumption that hive would allow>
> reading the data since we have written in the format Hive expects.
However,>
> when we are trying to read the data from the Hive table, we are running>
> into the following exception.>
>
>
> *Question : *>
> *---------------*>
> *1. Is there any way we can work around this issue by making hive read
the>
> data when the timestamp field is set as bytes*>
> *2. Is there any way in which the data type can be set as INT96 in the>
> parquet schema ?*>
>
> Exception :>
> ========>
> Failed with exception>
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:>
> java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot
be>
> cast to org.apache.hadoop.hive.serde2.io.TimestampWritable>
> ========>
>
> Schema of the file>
> =============>
> file schema: parquet.filecc>
>
-------------------------------------------------------------------------------->

> C1:          REQUIRED INT32 R:0 D:0>
> C2:          REQUIRED BINARY O:UTF8 R:0 D:0>
> C3:          REQUIRED BINARY O:UTF8 R:0 D:0>
> *C4:          REQUIRED BINARY R:0 D:0                  ----> Timestamp>
> Column*>
> *C5:          REQUIRED BINARY R:0 D:0                  ----> Timestamp>
> Column*>
>
>
----------------------------------------------------------------------------------------------------------->

>
> hive> show create table HiveParquetTimestamp;>
> OK>
> CREATE EXTERNAL TABLE `HiveParquetTimestamp`(>
>   `c1` int,>
>   `c2` char(4),>
>   `c3` varchar(8),>
>   `c4` timestamp,>
>   `c5` timestamp)>
> ROW FORMAT SERDE>
>   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'>
> STORED AS INPUTFORMAT>
>   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'>
> OUTPUTFORMAT>
>   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'>
> LOCATION>
>   'hdfs://cdhkrb123.fyre.com:8020/tmp/HiveParquetTimestamp'>
>
> -- >
> Srinivas>
> (*-*)>
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------>

> You have to grow from the inside out. None can teach you, none can make
you>
> spiritual.>
>                       -Narendra Nath Dutta(Swamy Vivekananda)>
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------>

>

Reply via email to