How to write date in parquet-avro?

Bhavesh K Shah Mon, 10 Aug 2015 08:26:21 -0700

Hi,
I have an use case where I want to write my data into Hive using Parquet file 
format. I want to perform all this using Cascading framework. I came across the 
Cascading Parquet-Avro scheme (To convert data in Parquet format) & HiveTap (TO 
Write the data in Hive Table) which does this job. But the problem I am facing 
while doing this I was not able to write the Date datatype in Avro.


So I tried the same thing by converting the date into long values (as Avro 
doesn't provide Date provision in it schema) but it didn't worked for me as my 
Hive table has field with Date datatype. It gave me exception "LongWritable 
cannot cast to DateWritable".

So is there any way through which I can write the date in Hive Table through 
Parquet-Avro?

Below is my code for the same:
public class Test {

       public static void main(String[] args) throws IOException {

              Properties properties = new Properties();
              String tableName = "datetest";
              CoercibleType dateType = new DateType("yyyy-MM-dd");

              Fields field = new Fields("f1", "f2", "f3").applyTypes(new Type[] 
{
                           String.class, String.class, dateType });

              String[] columnNames = { "f1", "f2", "f3" };
              String[] columnTypes = { "string", "string", "date" };

              Tap source = new Hfs(new TextDelimited(field, false, ","),
                           "data/file2.txt");

              HiveTableDescriptor tableDesc = new HiveTableDescriptor("default",
              tableName, columnNames, columnTypes, new String[] {}, ",");

              HiveTap hiveTap = new HiveTap(tableDesc, new ParquetAvroScheme(
              new Schema.Parser().parse(ParquetAvro2.class.getClassLoader()
              .getResourceAsStream("avro2.avsc"))));


              Pipe p = new Pipe("pipe");

              FlowDef flowDef = FlowDef.flowDef().addSource(p, source)
                           .addTailSink(p, hiveTap);

              properties = new Properties();
              AppProps.setApplicationName(properties,
                           "cascading hive integration demo");

              new 
Hadoop2MR1FlowConnector(properties).connect(flowDef).complete();
       }
}

avro2.avsc(Avro Schema):
{
     "type": "record",
     "name": "avro",
     "fields": [
       { "name": "f1", "type": "string" },
       { "name": "f2", "type": "string" },
      {"name": "f3", "type":"long"}
     ]
}



Thanks,
Bhavesh
**************************************Disclaimer******************************************
 This e-mail message and any attachments may contain confidential information 
and is for the sole use of the intended recipient(s) only. Any views or 
opinions presented or implied are solely those of the author and do not 
necessarily represent the views of BitWise. If you are not the intended 
recipient(s), you are hereby notified that disclosure, printing, copying, 
forwarding, distribution, or the taking of any action whatsoever in reliance on 
the contents of this electronic information is strictly prohibited. If you have 
received this e-mail message in error, please immediately notify the sender and 
delete the electronic message and any attachments.BitWise does not accept 
liability for any virus introduced by this e-mail or any attachments. 
********************************************************************************************

How to write date in parquet-avro?

Reply via email to