RE: issue while reading parquet file in hive

Santlal J Gupta Wed, 05 Aug 2015 21:05:21 -0700

Hi,

Int96 is not supported in the cascading parquet. It supports Int32 and Int64. 
So that's why I have used binary instead of Int96.


Thanks,
Santlal J. Gupta

-----Original Message-----
From: Sergio Pena [mailto:sergio.p...@cloudera.com] 
Sent: Wednesday, August 5, 2015 11:00 PM
To: dev@hive.apache.org
Subject: Re: issue while reading parquet file in hive

Hi Santlal,

Hive uses parquet int96 type to write and read timestamps. Probably the error 
is because of that. You can try with int96 instead of binary.

- Sergio

On Tue, Jul 21, 2015 at 1:54 AM, Santlal J Gupta < 
santlal.gu...@bitwiseglobal.com> wrote:

> Hello,
>
>
>
> I have following issue.
>
>
>
> I have created parquet file through cascading parquet  and want to  
> load into the hive table.
>
> My datafile contain data of type timestamp.
>
> Cascading parquet does not  support  timestamp data type , so while 
> creating parquet file I have given as binary type. After generating 
> parquet file , this  Parquet file is loaded successfully in the hive .
>
>
>
> While creating hive table I have given the column type as timestamp.
>
>
>
> Code :
>
>
>
> package com.parquet.TimestampTest;
>
>
>
> import cascading.flow.FlowDef;
>
> import cascading.flow.hadoop.HadoopFlowConnector;
>
> import cascading.pipe.Pipe;
>
> import cascading.scheme.Scheme;
>
> import cascading.scheme.hadoop.TextDelimited;
>
> import cascading.tap.SinkMode;
>
> import cascading.tap.Tap;
>
> import cascading.tap.hadoop.Hfs;
>
> import cascading.tuple.Fields;
>
> import parquet.cascading.ParquetTupleScheme;
>
>
>
> public class GenrateTimeStampParquetFile {
>
>                 static String inputPath = 
> "target/input/timestampInputFile1";
>
>                 static String outputPath = 
> "target/parquetOutput/TimestampOutput";
>
>
>
>                 public static void main(String[] args) {
>
>
>
>                                 write();
>
>                 }
>
>
>
>                 private static void write() {
>
>                                 // TODO Auto-generated method stub
>
>
>
>                                 Fields field = new 
> Fields("timestampField").applyTypes(String.class);
>
>                                 Scheme sourceSch = new 
> TextDelimited(field, false, "\n");
>
>
>
>                                 Fields outputField = new 
> Fields("timestampField");
>
>
>
>                                 Scheme sinkSch = new 
> ParquetTupleScheme(field, outputField,
>
>                                                                 
> "message TimeStampTest{optional binary timestampField ;}");
>
>
>
>                                 Tap source = new Hfs(sourceSch, 
> inputPath);
>
>                                 Tap sink = new Hfs(sinkSch, 
> outputPath, SinkMode.REPLACE);
>
>
>
>                                 Pipe pipe = new Pipe("Hive 
> timestamp");
>
>
>
>                                 FlowDef fd = 
> FlowDef.flowDef().addSource(pipe, source).addTailSink(pipe, sink);
>
>
>
>                                 new
> HadoopFlowConnector().connect(fd).complete();
>
>                 }
>
> }
>
>
>
> Input file:
>
>
>
> timestampInputFile1
>
>
>
> timestampField
>
> 1988-05-25 15:15:15.254
>
> 1987-05-06 14:14:25.362
>
>
>
> After running the code following files are generated.
>
> Output :
>
> 1. part-00000-m-00000.parquet
>
> 2. _SUCCESS
>
> 3. _metadata
>
> 4. _common_metadata
>
>
>
> I have created the table in hive to load the  
> part-00000-m-00000.parquet file.
>
>
>
> I have written following query in the hive.
>
> Query :
>
>
>
> hive> create table test3(timestampField timestamp) stored as parquet;
>
> hive> load data local inpath
> '/home/hduser/parquet_testing/part-00000-m-00000.parquet' into table 
> test3;
>
> hive> select  * from test3;
>
>
>
> After running above command I got following as output.
>
>
>
> Output :
>
>
>
> OK
>
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
>
> SLF4J: Defaulting to no-operation (NOP) logger implementation
>
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for 
> further details.
>
> Failed with exception
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
> java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable 
> cannot be cast to org.apache.hadoop.hive.serde2.io.TimestampWritable
>
>
>
>
>
> But I have got above exception.
>
>
>
> So please help me to solve this problem.
>
>
>
> Currently I am using
>
>     Hive 1.1.0-cdh5.4.2.
>
>    Cascading 2.5.1
>
>    parquet-format-2.2.0
>
>
>
> Thanks
>
> Santlal J. Gupta
>
>
>
>
>
> **************************************Disclaimer**********************
> ******************** This e-mail message and any attachments may 
> contain confidential information and is for the sole use of the 
> intended recipient(s) only. Any views or opinions presented or implied 
> are solely those of the author and do not necessarily represent the 
> views of BitWise. If you are not the intended recipient(s), you are 
> hereby notified that disclosure, printing, copying, forwarding, 
> distribution, or the taking of any action whatsoever in reliance on 
> the contents of this electronic information is strictly prohibited. If 
> you have received this e-mail message in error, please immediately 
> notify the sender and delete the electronic message and any 
> attachments.BitWise does not accept liability for any virus introduced 
> by this e-mail or any attachments.
> **********************************************************************
> **********************
>

RE: issue while reading parquet file in hive

Reply via email to