Yes, it is supported in 1.2.1. It went in here:
https://github.com/apache/hive/commit/912b4897ed457cfc447995b124ae84078287530b Are you using a version of Parquet with that pull request in it? Also, if you're using CDH this may not work. rb On Fri, Mar 11, 2016 at 12:40 AM, Ravi Tatapudi <[email protected]> wrote: > Hello Ryan: > > I am using hive-version: 1.2.1, as indicated below: > > -------------------------------------- > $ hive --version > Hive 1.2.1 > Subversion git://localhost.localdomain/home/sush/dev/hive.git -r > 243e7c1ac39cb7ac8b65c5bc6988f5cc3162f558 > Compiled by sush on Fri Jun 19 02:03:48 PDT 2015 > From source with checksum ab480aca41b24a9c3751b8c023338231 > $ > -------------------------------------- > > As I understand, this version of "hive" supports "date" datatype. right ?. > Do you want me to re-test using any other higher-version of hive ? Pl. let > me know your thoughts. > > Thanks, > Ravi > > > > From: Ryan Blue <[email protected]> > To: Parquet Dev <[email protected]> > Cc: Nagesh R Charka/India/IBM@IBMIN, Srinivas > Mudigonda/India/IBM@IBMIN > Date: 03/11/2016 06:18 AM > Subject: Re: How to write "date, timestamp, decimal" data to > Parquet-files > > > > What version of Hive are you using? You should make sure date is supported > there. > > rb > > On Thu, Mar 10, 2016 at 3:11 AM, Ravi Tatapudi <[email protected]> > wrote: > > > Hello Ryan: > > > > Many thanks for the reply. I see that, the text-attachment containing my > > test-program is not sent to the mail-group, but got filtered out. Hence, > > copying the program-code below: > > > > ================================================================= > > import java.io.IOException; > > import java.util.*; > > import org.apache.hadoop.conf.Configuration; > > import org.apache.hadoop.fs.FileSystem; > > import org.apache.hadoop.fs.Path; > > import org.apache.avro.Schema; > > import org.apache.avro.Schema.Type; > > import org.apache.avro.Schema.Field; > > import org.apache.avro.generic.* ; > > import org.apache.avro.LogicalTypes; > > import org.apache.avro.LogicalTypes.*; > > import org.apache.hadoop.hive.common.type.HiveDecimal; > > import parquet.avro.*; > > > > public class pqtw { > > > > public static Schema makeSchema() { > > List<Field> fields = new ArrayList<Field>(); > > fields.add(new Field("name", Schema.create(Type.STRING), null, > > null)); > > fields.add(new Field("age", Schema.create(Type.INT), null, null)); > > > > Schema date = > > LogicalTypes.date().addToSchema(Schema.create(Type.INT)) ; > > fields.add(new Field("doj", date, null, null)); > > > > Schema schema = Schema.createRecord("filecc", null, "parquet", > > false); > > schema.setFields(fields); > > > > return(schema); > > } > > > > public static GenericData.Record makeRecord (Schema schema, String name, > > int age, int doj) { > > GenericData.Record record = new GenericData.Record(schema); > > record.put("name", name); > > record.put("age", age); > > record.put("doj", doj); > > return(record); > > } > > > > public static void main(String[] args) throws IOException, > > > > InterruptedException, ClassNotFoundException { > > > > String pqfile = "/tmp/pqtfile1"; > > > > try { > > > > Configuration conf = new Configuration(); > > FileSystem fs = FileSystem.getLocal(conf); > > > > Schema schema = makeSchema() ; > > GenericData.Record rec = makeRecord(schema,"abcd", 21,15000) ; > > AvroParquetWriter writer = new AvroParquetWriter(new > Path(pqfile), > > schema); > > writer.write(rec); > > writer.close(); > > } > > catch (Exception e) > > { > > e.printStackTrace(); > > } > > } > > } > > ================================================================= > > > > With the above logic, I could write the data to parquet-file. However, > > when I load the same into a hive-table & select columns, I could select > > the columns: "name", "age" (i.e., VARCHAR, INT columns) successfully, > but > > select of "date" column failed with the error given below: > > > > > > > > -------------------------------------------------------------------------------- > > hive> CREATE TABLE PT1 (name varchar(10), age int, doj date) STORED AS > > PARQUET ; > > OK > > Time taken: 0.369 seconds > > hive> load data local inpath '/tmp/pqtfile1' into table PT1; > > hive> SELECT name,age from PT1; > > OK > > abcd 21 > > Time taken: 0.311 seconds, Fetched: 1 row(s) > > hive> SELECT doj from PT1; > > OK > > Failed with exception > > java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: > > java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be > > cast to org.apache.hadoop.hive.serde2.io.DateWritable > > Time taken: 0.167 seconds > > hive> > > > > > > -------------------------------------------------------------------------------- > > > > Basically, for "date datatype", I am trying to pass an integer-value > (for > > the # of days from Unix epoch, 1 January 1970, so that the date falls > > somewhere around 2011..etc). Is this the correct approach to process > date > > data (or is there any other approach / API to do it) ? Could you please > > let me know your inputs, in this regard ? > > > > Thanks, > > Ravi > > > > > > > > From: Ryan Blue <[email protected]> > > To: Parquet Dev <[email protected]> > > Cc: Nagesh R Charka/India/IBM@IBMIN, Srinivas > > Mudigonda/India/IBM@IBMIN > > Date: 03/09/2016 10:48 PM > > Subject: Re: How to write "date, timestamp, decimal" data to > > Parquet-files > > > > > > > > Hi Ravi, > > > > Not all of the types are fully-implemented yet. I think Hive only has > > partial support. If I remember correctly: > > * Decimal is supported if the backing primitive type is fixed-length > > binary > > * Date and Timestamp are supported, but Time has not been implemented > yet > > > > For object models you can build applications on (instead of those > embedded > > in SQL), only Avro objects can support those types through its > > LogicalTypes > > API. That API has been implemented in parquet-avro, but not yet > committed. > > I would like for this feature to make it into 1.9.0. If you want to test > > in > > the mean time, check out the pull request: > > > > https://github.com/apache/parquet-mr/pull/318 > > > > rb > > > > On Wed, Mar 9, 2016 at 5:09 AM, Ravi Tatapudi <[email protected]> > > wrote: > > > > > Hello, > > > > > > I am Ravi Tatapudi, from IBM-India. I am working on a simple > test-tool, > > > that writes data to Parquet-files, which can be imported into > > hive-tables. > > > Pl. find attached sample-program, which writes simple > parquet-data-file: > > > > > > > > > > > > Using the above program, I could create "parquet-files" with > data-types: > > > INT, LONG, STRING, Boolean...etc (i.e., basically all data-types > > supported > > > by "org.apache.avro.Schema.Type) & load it into "hive" tables > > > successfully. > > > > > > Now, I am trying to figure out, how to write "date, timestamp, decimal > > > data" into parquet-files. In this context, I request you provide the > > > possible options (and/or sample-program, if any..), in this regard. > > > > > > Thanks, > > > Ravi > > > > > > > > > > > -- > > Ryan Blue > > Software Engineer > > Netflix > > > > > > > > > > > -- > Ryan Blue > Software Engineer > Netflix > > > > -- Ryan Blue Software Engineer Netflix
