Hi Ravi, Hive's int96 timestamp is based on an format originally used by the Impala project. It isn't well-defined, assumes that all int96 values are timestamps, and implements nanosecond precision. It's not a good idea to use it, so I don't think we will be implementing support for it in the Avro API. There is, however, support for timestamp-millis and timestamp-micros types in 1.9.0.
rb On Wed, Jul 6, 2016 at 3:17 AM, Ravi Tatapudi <[email protected]> wrote: > Hello, > > I tried reading timestamp-data from a parquet-file (created as part of > hive-table stored in parquet-format) with a java-sample-program using > parquet-avro-API-version: 1.8.1 and I got the below exception: > > ================================================ > java.lang.IllegalArgumentException: INT96 not yet implemented. > at > > org.apache.parquet.avro.AvroSchemaConverter$1.convertINT96(AvroSchemaConverter.java:252) > at > > org.apache.parquet.avro.AvroSchemaConverter$1.convertINT96(AvroSchemaConverter.java:237) > at > > org.apache.parquet.schema.PrimitiveType$PrimitiveTypeName$7.convert(PrimitiveType.java:223) > at > > org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:236) > at > > org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:216) > at > > org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:210) > at > > org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:124) > at > > org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:171) > at > org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:149) > at > org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:125) > at pqtr.main(pqtr.java:63) > ================================================ > > I have looked at Parquet-code & see the following in > "org/apache/parquet/avro/AvroSchemaConverter.java": > > public Schema convertINT96(PrimitiveTypeName > primitiveTypeName) { > throw new IllegalArgumentException("INT96 not yet > implemented."); > > However, in other parts of the code (in files: > org/apache/parquet/encodings/FileEncodingsIT.java & > org/apache/parquet/statistics/TestStatistics.java), I see that > "convertINT96" is implemented to return Binary values. > > In this context, I am trying to figure out, why "Parquet-Avro-API" is > throwing error, instead of trying to return "Binary" (or) > Fixed_len_binary_array" values ? > > Will this be supported in the next Parquet-release (1.9.0?). If it is > already fixed & can be obtained via a pull-request, I request you to point > me to the same. > > Thanks, > Ravi > > > > From: Ravi Tatapudi/India/IBM > To: [email protected] > Date: 07/04/2016 12:28 PM > Subject: To read/write "timestamp" data from/to Parquet-formatted > files on HDFS. > > > Hello, > > I am trying to write/read "timestamp" data to/from > Parquet-formatted-files. > > As I understand, "the latest parquet-avro API version 1.8.1" doesn't > support "timestamp". Is this context, what other options/APIs are > available to read/write "timestamp" data from/to parquet-files ? > > Please let me know (and if there are any examples, could you please point > me to the same). > > Thanks, > Ravi > > > > -- Ryan Blue Software Engineer Netflix
