Hi We have a java application which writes parquet files. We are using the Parquet 1.9.0 API to write the Timestamp data. Since there are incompatibilities between the Parquet and Hive representation of the Timestamp data, we have tried to work around the same by writing the Parquet Timestamp data as 12 byte array by converting the Timestamp fields in the format Hive expects. However, while setting the field type in the Schema, since Avro Schema Types does not have an enumeration for the INT96 type, we have set it to bytes under the assumption that hive would allow reading the data since we have written in the format Hive expects. However, when we are trying to read the data from the Hive table, we are running into the following exception.
*Question : * *---------------* *1. Is there any way we can work around this issue by making hive read the data when the timestamp field is set as bytes* *2. Is there any way in which the data type can be set as INT96 in the parquet schema ?* Exception : ======== Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be cast to org.apache.hadoop.hive.serde2.io.TimestampWritable ======== Schema of the file ============= file schema: parquet.filecc -------------------------------------------------------------------------------- C1: REQUIRED INT32 R:0 D:0 C2: REQUIRED BINARY O:UTF8 R:0 D:0 C3: REQUIRED BINARY O:UTF8 R:0 D:0 *C4: REQUIRED BINARY R:0 D:0 ----> Timestamp Column* *C5: REQUIRED BINARY R:0 D:0 ----> Timestamp Column* ----------------------------------------------------------------------------------------------------------- hive> show create table HiveParquetTimestamp; OK CREATE EXTERNAL TABLE `HiveParquetTimestamp`( `c1` int, `c2` char(4), `c3` varchar(8), `c4` timestamp, `c5` timestamp) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'hdfs://cdhkrb123.fyre.com:8020/tmp/HiveParquetTimestamp' -- Srinivas (*-*) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ You have to grow from the inside out. None can teach you, none can make you spiritual. -Narendra Nath Dutta(Swamy Vivekananda) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------