Hello Ryan: I am using hive-version: 1.2.1, as indicated below:
-------------------------------------- $ hive --version Hive 1.2.1 Subversion git://localhost.localdomain/home/sush/dev/hive.git -r 243e7c1ac39cb7ac8b65c5bc6988f5cc3162f558 Compiled by sush on Fri Jun 19 02:03:48 PDT 2015 >From source with checksum ab480aca41b24a9c3751b8c023338231 $ -------------------------------------- As I understand, this version of "hive" supports "date" datatype. right ?. Do you want me to re-test using any other higher-version of hive ? Pl. let me know your thoughts. Thanks, Ravi From: Ryan Blue <[email protected]> To: Parquet Dev <[email protected]> Cc: Nagesh R Charka/India/IBM@IBMIN, Srinivas Mudigonda/India/IBM@IBMIN Date: 03/11/2016 06:18 AM Subject: Re: How to write "date, timestamp, decimal" data to Parquet-files What version of Hive are you using? You should make sure date is supported there. rb On Thu, Mar 10, 2016 at 3:11 AM, Ravi Tatapudi <[email protected]> wrote: > Hello Ryan: > > Many thanks for the reply. I see that, the text-attachment containing my > test-program is not sent to the mail-group, but got filtered out. Hence, > copying the program-code below: > > ================================================================= > import java.io.IOException; > import java.util.*; > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.fs.FileSystem; > import org.apache.hadoop.fs.Path; > import org.apache.avro.Schema; > import org.apache.avro.Schema.Type; > import org.apache.avro.Schema.Field; > import org.apache.avro.generic.* ; > import org.apache.avro.LogicalTypes; > import org.apache.avro.LogicalTypes.*; > import org.apache.hadoop.hive.common.type.HiveDecimal; > import parquet.avro.*; > > public class pqtw { > > public static Schema makeSchema() { > List<Field> fields = new ArrayList<Field>(); > fields.add(new Field("name", Schema.create(Type.STRING), null, > null)); > fields.add(new Field("age", Schema.create(Type.INT), null, null)); > > Schema date = > LogicalTypes.date().addToSchema(Schema.create(Type.INT)) ; > fields.add(new Field("doj", date, null, null)); > > Schema schema = Schema.createRecord("filecc", null, "parquet", > false); > schema.setFields(fields); > > return(schema); > } > > public static GenericData.Record makeRecord (Schema schema, String name, > int age, int doj) { > GenericData.Record record = new GenericData.Record(schema); > record.put("name", name); > record.put("age", age); > record.put("doj", doj); > return(record); > } > > public static void main(String[] args) throws IOException, > > InterruptedException, ClassNotFoundException { > > String pqfile = "/tmp/pqtfile1"; > > try { > > Configuration conf = new Configuration(); > FileSystem fs = FileSystem.getLocal(conf); > > Schema schema = makeSchema() ; > GenericData.Record rec = makeRecord(schema,"abcd", 21,15000) ; > AvroParquetWriter writer = new AvroParquetWriter(new Path(pqfile), > schema); > writer.write(rec); > writer.close(); > } > catch (Exception e) > { > e.printStackTrace(); > } > } > } > ================================================================= > > With the above logic, I could write the data to parquet-file. However, > when I load the same into a hive-table & select columns, I could select > the columns: "name", "age" (i.e., VARCHAR, INT columns) successfully, but > select of "date" column failed with the error given below: > > > -------------------------------------------------------------------------------- > hive> CREATE TABLE PT1 (name varchar(10), age int, doj date) STORED AS > PARQUET ; > OK > Time taken: 0.369 seconds > hive> load data local inpath '/tmp/pqtfile1' into table PT1; > hive> SELECT name,age from PT1; > OK > abcd 21 > Time taken: 0.311 seconds, Fetched: 1 row(s) > hive> SELECT doj from PT1; > OK > Failed with exception > java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be > cast to org.apache.hadoop.hive.serde2.io.DateWritable > Time taken: 0.167 seconds > hive> > > -------------------------------------------------------------------------------- > > Basically, for "date datatype", I am trying to pass an integer-value (for > the # of days from Unix epoch, 1 January 1970, so that the date falls > somewhere around 2011..etc). Is this the correct approach to process date > data (or is there any other approach / API to do it) ? Could you please > let me know your inputs, in this regard ? > > Thanks, > Ravi > > > > From: Ryan Blue <[email protected]> > To: Parquet Dev <[email protected]> > Cc: Nagesh R Charka/India/IBM@IBMIN, Srinivas > Mudigonda/India/IBM@IBMIN > Date: 03/09/2016 10:48 PM > Subject: Re: How to write "date, timestamp, decimal" data to > Parquet-files > > > > Hi Ravi, > > Not all of the types are fully-implemented yet. I think Hive only has > partial support. If I remember correctly: > * Decimal is supported if the backing primitive type is fixed-length > binary > * Date and Timestamp are supported, but Time has not been implemented yet > > For object models you can build applications on (instead of those embedded > in SQL), only Avro objects can support those types through its > LogicalTypes > API. That API has been implemented in parquet-avro, but not yet committed. > I would like for this feature to make it into 1.9.0. If you want to test > in > the mean time, check out the pull request: > > https://github.com/apache/parquet-mr/pull/318 > > rb > > On Wed, Mar 9, 2016 at 5:09 AM, Ravi Tatapudi <[email protected]> > wrote: > > > Hello, > > > > I am Ravi Tatapudi, from IBM-India. I am working on a simple test-tool, > > that writes data to Parquet-files, which can be imported into > hive-tables. > > Pl. find attached sample-program, which writes simple parquet-data-file: > > > > > > > > Using the above program, I could create "parquet-files" with data-types: > > INT, LONG, STRING, Boolean...etc (i.e., basically all data-types > supported > > by "org.apache.avro.Schema.Type) & load it into "hive" tables > > successfully. > > > > Now, I am trying to figure out, how to write "date, timestamp, decimal > > data" into parquet-files. In this context, I request you provide the > > possible options (and/or sample-program, if any..), in this regard. > > > > Thanks, > > Ravi > > > > > > -- > Ryan Blue > Software Engineer > Netflix > > > > -- Ryan Blue Software Engineer Netflix
