to Parquet-formatted files on HDFS.

Ryan Blue Fri, 22 Jul 2016 10:57:55 -0700

Hi Ravi,

Hive's int96 timestamp is based on an format originally used by the Impala
project. It isn't well-defined, assumes that all int96 values are
timestamps, and implements nanosecond precision. It's not a good idea to
use it, so I don't think we will be implementing support for it in the Avro
API. There is, however, support for timestamp-millis and timestamp-micros
types in 1.9.0.


rb

On Wed, Jul 6, 2016 at 3:17 AM, Ravi Tatapudi <[email protected]>
wrote:

> Hello,
>
> I tried reading timestamp-data from a parquet-file (created as part of
> hive-table stored in parquet-format) with a java-sample-program using
> parquet-avro-API-version: 1.8.1 and I got the below exception:
>
> ================================================
> java.lang.IllegalArgumentException: INT96 not yet implemented.
>         at
>
> org.apache.parquet.avro.AvroSchemaConverter$1.convertINT96(AvroSchemaConverter.java:252)
>         at
>
> org.apache.parquet.avro.AvroSchemaConverter$1.convertINT96(AvroSchemaConverter.java:237)
>         at
>
> org.apache.parquet.schema.PrimitiveType$PrimitiveTypeName$7.convert(PrimitiveType.java:223)
>         at
>
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:236)
>         at
>
> org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:216)
>         at
>
> org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:210)
>         at
>
> org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:124)
>         at
>
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:171)
>         at
> org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:149)
>         at
> org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:125)
>         at pqtr.main(pqtr.java:63)
> ================================================
>
> I have looked at Parquet-code & see the following in
> "org/apache/parquet/avro/AvroSchemaConverter.java":
>
>             public Schema convertINT96(PrimitiveTypeName
> primitiveTypeName) {
>               throw new IllegalArgumentException("INT96 not yet
> implemented.");
>
> However, in other parts of the code (in files:
> org/apache/parquet/encodings/FileEncodingsIT.java &
> org/apache/parquet/statistics/TestStatistics.java), I see that
> "convertINT96" is implemented to return Binary values.
>
> In this context, I am trying to figure out, why "Parquet-Avro-API" is
> throwing error, instead of trying to return "Binary" (or)
> Fixed_len_binary_array" values ?
>
> Will this be supported in the next Parquet-release (1.9.0?). If it is
> already fixed & can be obtained via a pull-request, I request you to point
> me to the same.
>
> Thanks,
>  Ravi
>
>
>
> From:   Ravi Tatapudi/India/IBM
> To:     [email protected]
> Date:   07/04/2016 12:28 PM
> Subject:        To read/write "timestamp" data from/to Parquet-formatted
> files on HDFS.
>
>
> Hello,
>
> I am trying to write/read "timestamp" data to/from
> Parquet-formatted-files.
>
> As I understand, "the latest parquet-avro API version 1.8.1" doesn't
> support "timestamp".  Is this context, what other options/APIs are
> available to read/write "timestamp" data from/to parquet-files ?
>
> Please let me know (and if there are any examples, could you please point
> me to the same).
>
> Thanks,
>  Ravi
>
>
>
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: To read/write "timestamp" data from/to Parquet-formatted files on HDFS.

Reply via email to