Hello All:
I have tried a sample program using "date" & "timestamp" datatypes with
"parquet-avro-1.9.0-API" & found the following:
=================================================================
Regarding "Date" data: (working fine):
--------------------------------------------------------
1) I could write "date" data & read it properly using
Parquet-Avro-1.9.0-API. And, when I read the same file via "hive" (as a
hive-table), it displayed the "date" data correctly. Also, I could read
the "parquet-files" (with "date" datatype, created via hive),
successfully.
Regarding "Timestamp" data: (has issues):
----------------------------------------------------------------
2) On similar lines, I could write "timestamp" data & read it properly,
using Parquet-Avro-1.9.0-API. However, when I tried to read the same file
using "hive" (as a hive-table), I got the below error: "Failed with
exception
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
cast to org.apache.hadoop.hive.erde2.io.TimestampWritable".
And when I try to read a "parquet-file" (with timestamp-data, created via
hive), using the sample-program (that uses Parquet-Avro-1.9.0-API), I got
the below exception:
error reading file: /tmp/000000_0 - printing stack trace....
java.lang.IllegalArgumentException: INT96 not yet implemented.
at
org.apache.parquet.avro.AvroSchemaConverter$1.convertINT96(AvroSchemaConverter.java:279)
at
org.apache.parquet.avro.AvroSchemaConverter$1.convertINT96(AvroSchemaConverter.java:264)
at
org.apache.parquet.schema.PrimitiveType$PrimitiveTypeName$7.convert(PrimitiveType.java:223)
at
org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:263)
at
org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:241)
at
org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:231)
at
org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:124)
at
org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:175)
at
org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:149)
at
org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:125)
at pqtr.main(pqtr.java:46)
=================================================================
Is this the expected result ? Is the "timestamp" datatype still
INCOMPATIBLE between "Hive's Parquet-tables" & Parquet-files generated by
"Parquet-Avro-API" ? If yes, are there any plans to provide any
compatibility (or) workaround for such scenarios ?
Could you please let me know.
Thanks,
Ravi