Hello Ryan: Many thanks for the inputs. I will try to build it today & see how it goes.
Could you please let me know, any approximate date (or month) as to, when "parquet-avro-1.9.0 (or any other parquet-avro-1.8.x, that would include this fix)" would be officially released (for example: by "june 2016" or "dec 2016" or later) ? It would be very helpful, for my planning. Thanks, Ravi From: Ryan Blue <[email protected]> To: Parquet Dev <[email protected]> Date: 04/04/2016 10:05 PM Subject: Re: How to write "date, timestamp, decimal" data to Parquet-files I don't think you can get the artifacts produced by our CI builds, but you can check out the branch and build it using instructions in the repository. On Mon, Apr 4, 2016 at 5:39 AM, Ravi Tatapudi <[email protected]> wrote: > Hello Ryan: > > Regarding the support for "date, timestamp, decimal" data types for > Parquet-files: > > In your earlier mail, you have mentioned the pull-request-URL: > https://github.com/apache/parquet-mr/pull/318 has the necessary support > for these data-types (and that it would be released as part of > parquet-avro-release:1.9.0). > > I see that, this fix is included in build# 1247 (& above?). How to get > this build (or the latest-build), that includes the JAR-file: > "parquet-avro" including the support for "date,timestamp"..etc. ? Could > you please let me know. > > Thanks, > Ravi > > > > From: Ryan Blue <[email protected]> > To: Parquet Dev <[email protected]> > Cc: Nagesh R Charka/India/IBM@IBMIN, Srinivas > Mudigonda/India/IBM@IBMIN > Date: 03/14/2016 09:56 PM > Subject: Re: How to write "date, timestamp, decimal" data to > Parquet-files > > > > Ravi, > > Support for those types in parquet-avro hasn't been committed yet. It's > implemented in the branch I pointed you to. If you want to use released > versions, it should be out in 1.9.0. > > rb > > On Sun, Mar 13, 2016 at 9:52 PM, Ravi Tatapudi <[email protected]> > wrote: > > > Hello Ryan: > > > > Thanks for the inputs. > > > > I am building & running the test-application, primarily using the > > following JAR-files (for Avro, Parquet-Avro & Hive APIs): > > > > 1) avro-1.8.0.jar > > 2) parquet-avro-1.6.0.jar (This is the latest one, found in the > > maven-repository-URL: > > http://mvnrepository.com/artifact/com.twitter/parquet-avro/1.6.0) > > 3) hive-exec-1.2.1.jar > > > > Am I supposed to build/run the test, using a different version of the > > JAR-files ? Could you please let me know. > > > > Thanks, > > Ravi > > > > > > > > > > From: Ryan Blue <[email protected]> > > To: Parquet Dev <[email protected]> > > Cc: Nagesh R Charka/India/IBM@IBMIN, Srinivas > > Mudigonda/India/IBM@IBMIN > > Date: 03/11/2016 10:54 PM > > Subject: Re: How to write "date, timestamp, decimal" data to > > Parquet-files > > > > > > > > Yes, it is supported in 1.2.1. It went in here: > > > > > > > > > > https://github.com/apache/hive/commit/912b4897ed457cfc447995b124ae84078287530b > > > > > > > Are you using a version of Parquet with that pull request in it? Also, > if > > you're using CDH this may not work. > > > > rb > > > > On Fri, Mar 11, 2016 at 12:40 AM, Ravi Tatapudi > <[email protected]> > > wrote: > > > > > Hello Ryan: > > > > > > I am using hive-version: 1.2.1, as indicated below: > > > > > > -------------------------------------- > > > $ hive --version > > > Hive 1.2.1 > > > Subversion git://localhost.localdomain/home/sush/dev/hive.git -r > > > 243e7c1ac39cb7ac8b65c5bc6988f5cc3162f558 > > > Compiled by sush on Fri Jun 19 02:03:48 PDT 2015 > > > From source with checksum ab480aca41b24a9c3751b8c023338231 > > > $ > > > -------------------------------------- > > > > > > As I understand, this version of "hive" supports "date" datatype. > right > > ?. > > > Do you want me to re-test using any other higher-version of hive ? Pl. > > let > > > me know your thoughts. > > > > > > Thanks, > > > Ravi > > > > > > > > > > > > From: Ryan Blue <[email protected]> > > > To: Parquet Dev <[email protected]> > > > Cc: Nagesh R Charka/India/IBM@IBMIN, Srinivas > > > Mudigonda/India/IBM@IBMIN > > > Date: 03/11/2016 06:18 AM > > > Subject: Re: How to write "date, timestamp, decimal" data to > > > Parquet-files > > > > > > > > > > > > What version of Hive are you using? You should make sure date is > > supported > > > there. > > > > > > rb > > > > > > On Thu, Mar 10, 2016 at 3:11 AM, Ravi Tatapudi > > <[email protected]> > > > wrote: > > > > > > > Hello Ryan: > > > > > > > > Many thanks for the reply. I see that, the text-attachment > containing > > my > > > > test-program is not sent to the mail-group, but got filtered out. > > Hence, > > > > copying the program-code below: > > > > > > > > ================================================================= > > > > import java.io.IOException; > > > > import java.util.*; > > > > import org.apache.hadoop.conf.Configuration; > > > > import org.apache.hadoop.fs.FileSystem; > > > > import org.apache.hadoop.fs.Path; > > > > import org.apache.avro.Schema; > > > > import org.apache.avro.Schema.Type; > > > > import org.apache.avro.Schema.Field; > > > > import org.apache.avro.generic.* ; > > > > import org.apache.avro.LogicalTypes; > > > > import org.apache.avro.LogicalTypes.*; > > > > import org.apache.hadoop.hive.common.type.HiveDecimal; > > > > import parquet.avro.*; > > > > > > > > public class pqtw { > > > > > > > > public static Schema makeSchema() { > > > > List<Field> fields = new ArrayList<Field>(); > > > > fields.add(new Field("name", Schema.create(Type.STRING), null, > > > > null)); > > > > fields.add(new Field("age", Schema.create(Type.INT), null, > > null)); > > > > > > > > Schema date = > > > > LogicalTypes.date().addToSchema(Schema.create(Type.INT)) ; > > > > fields.add(new Field("doj", date, null, null)); > > > > > > > > Schema schema = Schema.createRecord("filecc", null, "parquet", > > > > false); > > > > schema.setFields(fields); > > > > > > > > return(schema); > > > > } > > > > > > > > public static GenericData.Record makeRecord (Schema schema, String > > name, > > > > int age, int doj) { > > > > GenericData.Record record = new GenericData.Record(schema); > > > > record.put("name", name); > > > > record.put("age", age); > > > > record.put("doj", doj); > > > > return(record); > > > > } > > > > > > > > public static void main(String[] args) throws IOException, > > > > > > > > InterruptedException, ClassNotFoundException { > > > > > > > > String pqfile = "/tmp/pqtfile1"; > > > > > > > > try { > > > > > > > > Configuration conf = new Configuration(); > > > > FileSystem fs = FileSystem.getLocal(conf); > > > > > > > > Schema schema = makeSchema() ; > > > > GenericData.Record rec = makeRecord(schema,"abcd", 21,15000) > ; > > > > AvroParquetWriter writer = new AvroParquetWriter(new > > > Path(pqfile), > > > > schema); > > > > writer.write(rec); > > > > writer.close(); > > > > } > > > > catch (Exception e) > > > > { > > > > e.printStackTrace(); > > > > } > > > > } > > > > } > > > > ================================================================= > > > > > > > > With the above logic, I could write the data to parquet-file. > However, > > > > when I load the same into a hive-table & select columns, I could > > select > > > > the columns: "name", "age" (i.e., VARCHAR, INT columns) > successfully, > > > but > > > > select of "date" column failed with the error given below: > > > > > > > > > > > > > > > > > > > > > > > > -------------------------------------------------------------------------------- > > > > hive> CREATE TABLE PT1 (name varchar(10), age int, doj date) STORED > AS > > > > PARQUET ; > > > > OK > > > > Time taken: 0.369 seconds > > > > hive> load data local inpath '/tmp/pqtfile1' into table PT1; > > > > hive> SELECT name,age from PT1; > > > > OK > > > > abcd 21 > > > > Time taken: 0.311 seconds, Fetched: 1 row(s) > > > > hive> SELECT doj from PT1; > > > > OK > > > > Failed with exception > > > > > java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: > > > > java.lang.ClassCastException: org.apache.hadoop.io.IntWritable > cannot > > be > > > > cast to org.apache.hadoop.hive.serde2.io.DateWritable > > > > Time taken: 0.167 seconds > > > > hive> > > > > > > > > > > > > > > > > > > > > -------------------------------------------------------------------------------- > > > > > > > > Basically, for "date datatype", I am trying to pass an integer-value > > > (for > > > > the # of days from Unix epoch, 1 January 1970, so that the date > falls > > > > somewhere around 2011..etc). Is this the correct approach to process > > > date > > > > data (or is there any other approach / API to do it) ? Could you > > please > > > > let me know your inputs, in this regard ? > > > > > > > > Thanks, > > > > Ravi > > > > > > > > > > > > > > > > From: Ryan Blue <[email protected]> > > > > To: Parquet Dev <[email protected]> > > > > Cc: Nagesh R Charka/India/IBM@IBMIN, Srinivas > > > > Mudigonda/India/IBM@IBMIN > > > > Date: 03/09/2016 10:48 PM > > > > Subject: Re: How to write "date, timestamp, decimal" data to > > > > Parquet-files > > > > > > > > > > > > > > > > Hi Ravi, > > > > > > > > Not all of the types are fully-implemented yet. I think Hive only > has > > > > partial support. If I remember correctly: > > > > * Decimal is supported if the backing primitive type is fixed-length > > > > binary > > > > * Date and Timestamp are supported, but Time has not been > implemented > > > yet > > > > > > > > For object models you can build applications on (instead of those > > > embedded > > > > in SQL), only Avro objects can support those types through its > > > > LogicalTypes > > > > API. That API has been implemented in parquet-avro, but not yet > > > committed. > > > > I would like for this feature to make it into 1.9.0. If you want to > > test > > > > in > > > > the mean time, check out the pull request: > > > > > > > > https://github.com/apache/parquet-mr/pull/318 > > > > > > > > rb > > > > > > > > On Wed, Mar 9, 2016 at 5:09 AM, Ravi Tatapudi > > <[email protected]> > > > > wrote: > > > > > > > > > Hello, > > > > > > > > > > I am Ravi Tatapudi, from IBM-India. I am working on a simple > > > test-tool, > > > > > that writes data to Parquet-files, which can be imported into > > > > hive-tables. > > > > > Pl. find attached sample-program, which writes simple > > > parquet-data-file: > > > > > > > > > > > > > > > > > > > > Using the above program, I could create "parquet-files" with > > > data-types: > > > > > INT, LONG, STRING, Boolean...etc (i.e., basically all data-types > > > > supported > > > > > by "org.apache.avro.Schema.Type) & load it into "hive" tables > > > > > successfully. > > > > > > > > > > Now, I am trying to figure out, how to write "date, timestamp, > > decimal > > > > > data" into parquet-files. In this context, I request you provide > > the > > > > > possible options (and/or sample-program, if any..), in this > regard. > > > > > > > > > > Thanks, > > > > > Ravi > > > > > > > > > > > > > > > > > > > > > -- > > > > Ryan Blue > > > > Software Engineer > > > > Netflix > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Ryan Blue > > > Software Engineer > > > Netflix > > > > > > > > > > > > > > > > > > -- > > Ryan Blue > > Software Engineer > > Netflix > > > > > > > > > > > -- > Ryan Blue > Software Engineer > Netflix > > > > -- Ryan Blue Software Engineer Netflix
