Re: Regarding "Decimal, timestamp, and date support in Avro" in parquet-1.9.0

Ravi Tatapudi Fri, 21 Oct 2016 05:42:07 -0700

Hello All:

I have tried a sample program using "date" & "timestamp" datatypes with 
"parquet-avro-1.9.0-API" & found the following:


=================================================================
Regarding "Date" data: (working fine):
--------------------------------------------------------
1) I could write "date" data & read it properly using 
Parquet-Avro-1.9.0-API. And, when I read the same file via "hive" (as a 
hive-table), it displayed the "date" data correctly. Also, I could read 
the "parquet-files" (with "date" datatype, created via hive), 
successfully.

Regarding "Timestamp" data: (has issues):
----------------------------------------------------------------

2) On similar lines, I could write "timestamp" data & read it properly, 
using Parquet-Avro-1.9.0-API. However, when I tried to read the same file 
using "hive" (as a hive-table), I got the below error: "Failed with 
exception 
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be 
cast to org.apache.hadoop.hive.erde2.io.TimestampWritable".

And when I try to read a "parquet-file" (with timestamp-data, created via 
hive), using the sample-program (that uses Parquet-Avro-1.9.0-API), I got 
the below exception:
error reading file: /tmp/000000_0 - printing stack trace....
java.lang.IllegalArgumentException: INT96 not yet implemented.
        at 
org.apache.parquet.avro.AvroSchemaConverter$1.convertINT96(AvroSchemaConverter.java:279)
        at 
org.apache.parquet.avro.AvroSchemaConverter$1.convertINT96(AvroSchemaConverter.java:264)
        at 
org.apache.parquet.schema.PrimitiveType$PrimitiveTypeName$7.convert(PrimitiveType.java:223)
        at 
org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:263)
        at 
org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:241)
        at 
org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:231)
        at 
org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:124)
        at 
org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:175)
        at 
org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:149)
        at 
org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:125)
        at pqtr.main(pqtr.java:46)
=================================================================

Is this the expected result ? Is the "timestamp" datatype still 
INCOMPATIBLE between "Hive's Parquet-tables" & Parquet-files generated by 
"Parquet-Avro-API" ? If yes, are there any plans to provide any 
compatibility (or) workaround for such scenarios ?

Could you please let me know.

Thanks,
 Ravi

Re: Regarding "Decimal, timestamp, and date support in Avro" in parquet-1.9.0

Reply via email to