I think the only support right now is in Hive and Spark. On Fri, Jul 29, 2016 at 7:15 AM, Ravi Tatapudi <[email protected]> wrote:
> Hello Ryan: > > Did you get a chance to see my queris in the mail below ? Basically, I am > trying to understand, using which API, we would use to be able to read the > "timestamp" data (even after truncating "nano/micro/milli seconds part) > from Parquet-files, created by Hive or any other application (which > essentially boils down to the below queries...). Is it Parquet-Avro-API > (or) some other API ? > > ------------------------------------ > 1) Is it possible to read "timestamp" data from a "parquet-file" > (generated by hive, as part of a table stored as parquet & timestamp-rows > inserted) using a "standalone-JAVA-application" using > "parquet-avro-API-1.90" ? > > 2) Is it possible to read "timestamp" data written to a parquet-file (by a > stand-alone-JAVA-application, using "parquet-avro-API-1.9.0") would be > read by "hive" successfully ? > > 3) Using Parquet-1.9.0-API, when we try to read/write data from hive, does > it successfully reads (or writes) the data, after truncating the "nano > seconds" part (or) will it fail with "incompatible object" errors ? > ------------------------------------ > > Could you please let me know your thoughts... > > Thanks, > Ravi > > > > From: Ravi Tatapudi/India/IBM > To: [email protected] > Cc: Srinivas Mudigonda/India/IBM@IBMIN > Date: 07/25/2016 11:35 AM > Subject: Re: To read/write "timestamp" data from/to > Parquet-formatted files on HDFS. > > > Hello Ryan: > > Many thanks for the reply. > > Our requirement is to read "timestamp" data, from parquet-files on HDFS > (created as part of "hive-tables stored as parquet"). At this point, we > are not really looking for "milli / micro / nano" seconds part of > timestamp, but trying to read the timestamp data as: "YYYY-MM-DD hh:mm:ss" > format. > > In this context, could you please provide your inputs to the following > queries, so that we can plan accordingly: > > ================================================================ > 1) Is it possible to read "timestamp" data from a "parquet-file" > (generated by hive, as part of a table stored as parquet & timestamp-rows > inserted) using a "standalone-JAVA-application" using > "parquet-avro-API-1.90" ? > > 2) Is it possible to read "timestamp" data written to a parquet-file (by a > stand-alone-JAVA-application, using "parquet-avro-API-1.9.0") would be > read by "hive" successfully ? > > 3) Using Parquet-1.9.0-API, when we try to read/write data from hive, does > it successfully reads (or writes) the data, after truncating the "nano > seconds" part (or) will it fail with "incompatible object" errors ? > ================================================================ > > Thanks, > Ravi > > > > > > From: Ryan Blue <[email protected]> > To: Parquet Dev <[email protected]> > Cc: Srinivas Mudigonda/India/IBM@IBMIN > Date: 07/22/2016 11:27 PM > Subject: Re: To read/write "timestamp" data from/to > Parquet-formatted files on HDFS. > > > > Hi Ravi, > > Hive's int96 timestamp is based on an format originally used by the Impala > project. It isn't well-defined, assumes that all int96 values are > timestamps, and implements nanosecond precision. It's not a good idea to > use it, so I don't think we will be implementing support for it in the > Avro > API. There is, however, support for timestamp-millis and timestamp-micros > types in 1.9.0. > > rb > > On Wed, Jul 6, 2016 at 3:17 AM, Ravi Tatapudi <[email protected]> > wrote: > > > Hello, > > > > I tried reading timestamp-data from a parquet-file (created as part of > > hive-table stored in parquet-format) with a java-sample-program using > > parquet-avro-API-version: 1.8.1 and I got the below exception: > > > > ================================================ > > java.lang.IllegalArgumentException: INT96 not yet implemented. > > at > > > > > > org.apache.parquet.avro.AvroSchemaConverter$1.convertINT96(AvroSchemaConverter.java:252) > > at > > > > > > org.apache.parquet.avro.AvroSchemaConverter$1.convertINT96(AvroSchemaConverter.java:237) > > at > > > > > > org.apache.parquet.schema.PrimitiveType$PrimitiveTypeName$7.convert(PrimitiveType.java:223) > > at > > > > > > org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:236) > > at > > > > > > org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:216) > > at > > > > > > org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:210) > > at > > > > > > org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:124) > > at > > > > > > org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:171) > > at > > > org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:149) > > at > > org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:125) > > at pqtr.main(pqtr.java:63) > > ================================================ > > > > I have looked at Parquet-code & see the following in > > "org/apache/parquet/avro/AvroSchemaConverter.java": > > > > public Schema convertINT96(PrimitiveTypeName > > primitiveTypeName) { > > throw new IllegalArgumentException("INT96 not yet > > implemented."); > > > > However, in other parts of the code (in files: > > org/apache/parquet/encodings/FileEncodingsIT.java & > > org/apache/parquet/statistics/TestStatistics.java), I see that > > "convertINT96" is implemented to return Binary values. > > > > In this context, I am trying to figure out, why "Parquet-Avro-API" is > > throwing error, instead of trying to return "Binary" (or) > > Fixed_len_binary_array" values ? > > > > Will this be supported in the next Parquet-release (1.9.0?). If it is > > already fixed & can be obtained via a pull-request, I request you to > point > > me to the same. > > > > Thanks, > > Ravi > > > > > > > > From: Ravi Tatapudi/India/IBM > > To: [email protected] > > Date: 07/04/2016 12:28 PM > > Subject: To read/write "timestamp" data from/to Parquet-formatted > > files on HDFS. > > > > > > Hello, > > > > I am trying to write/read "timestamp" data to/from > > Parquet-formatted-files. > > > > As I understand, "the latest parquet-avro API version 1.8.1" doesn't > > support "timestamp". Is this context, what other options/APIs are > > available to read/write "timestamp" data from/to parquet-files ? > > > > Please let me know (and if there are any examples, could you please > point > > me to the same). > > > > Thanks, > > Ravi > > > > > > > > > > > -- > Ryan Blue > Software Engineer > Netflix > > > > -- Ryan Blue Software Engineer Netflix
