Hello,

I tried reading timestamp-data from a parquet-file (created as part of 
hive-table stored in parquet-format) with a java-sample-program using 
parquet-avro-API-version: 1.8.1 and I got the below exception:

================================================
java.lang.IllegalArgumentException: INT96 not yet implemented.
        at 
org.apache.parquet.avro.AvroSchemaConverter$1.convertINT96(AvroSchemaConverter.java:252)
        at 
org.apache.parquet.avro.AvroSchemaConverter$1.convertINT96(AvroSchemaConverter.java:237)
        at 
org.apache.parquet.schema.PrimitiveType$PrimitiveTypeName$7.convert(PrimitiveType.java:223)
        at 
org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:236)
        at 
org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:216)
        at 
org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:210)
        at 
org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:124)
        at 
org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:171)
        at 
org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:149)
        at 
org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:125)
        at pqtr.main(pqtr.java:63)
================================================

I have looked at Parquet-code & see the following in 
"org/apache/parquet/avro/AvroSchemaConverter.java":

            public Schema convertINT96(PrimitiveTypeName 
primitiveTypeName) {
              throw new IllegalArgumentException("INT96 not yet 
implemented.");

However, in other parts of the code (in files: 
org/apache/parquet/encodings/FileEncodingsIT.java & 
org/apache/parquet/statistics/TestStatistics.java), I see that 
"convertINT96" is implemented to return Binary values.

In this context, I am trying to figure out, why "Parquet-Avro-API" is 
throwing error, instead of trying to return "Binary" (or) 
Fixed_len_binary_array" values ? 

Will this be supported in the next Parquet-release (1.9.0?). If it is 
already fixed & can be obtained via a pull-request, I request you to point 
me to the same.

Thanks,
 Ravi



From:   Ravi Tatapudi/India/IBM
To:     [email protected]
Date:   07/04/2016 12:28 PM
Subject:        To read/write "timestamp" data from/to Parquet-formatted 
files on HDFS.


Hello,

I am trying to write/read "timestamp" data to/from 
Parquet-formatted-files.

As I understand, "the latest parquet-avro API version 1.8.1" doesn't 
support "timestamp".  Is this context, what other options/APIs are 
available to read/write "timestamp" data from/to parquet-files ? 

Please let me know (and if there are any examples, could you please point 
me to the same).

Thanks,
 Ravi



Reply via email to