Hi Daniel, I am beginner in the cascading parquet. As per your guidline I have created table as by following command.
hive> create table test3(timestampField timestamp) stored as parquet; hive> load data local inpath '/home/hduser/parquet_testing/part-00000-m-00000.parquet' into table test3; hive> select * from test3; After running above command I got following as output. Output : OK SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be cast to org.apache.hadoop.hive.serde2.io.TimestampWritable Actually i want to create the parquet file through cascading parquet and load into the hive. And my parquet file contain data, that I want to store into hive column of type timestamp. But in the cascading parquet there is no timestamp datatype, so for that reason I have taken as Binary. And trying to load into table that have field of type timestamp. But I have got above exception. So please help me to solve this problem. Currently I am using Hive 1.1.0-cdh5.4.2. Cascading 2.5.1 parquet-format-2.2.0 Thanks, Santlal Gutpa -----Original Message----- From: Daniel Weeks [mailto:[email protected]] Sent: Friday, July 17, 2015 9:09 PM To: [email protected] Subject: Re: Issue while reading Parquet file in Hive Santial, It might just be as simple as the storage format for your hive table. I notice you say: hive> create table timestampTest (timestampField timestamp); But this should be: hive> create table timestampTest (timestampField timestamp) stored as parquet; Hive is probably processing the file as text. Please do a 'hive> desc formatted timestampTest;' and verify the input/output/serde for the table is actually parquet. -Dan On Thu, Jul 16, 2015 at 11:54 PM, Santlal J Gupta < [email protected]> wrote: > Hello, > > I have following issue. > > I have created parquet file through cascading parquet and want to > load into the hive table. Parquet file is loaded successfully but when > I try to read the file it gives null instead of actual data. Please > find the below code . > > package com.parquet.TimestampTest; > > import cascading.flow.FlowDef; > import cascading.flow.hadoop.HadoopFlowConnector; > import cascading.pipe.Pipe; > import cascading.scheme.Scheme; > import cascading.scheme.hadoop.TextDelimited; > import cascading.tap.SinkMode; > import cascading.tap.Tap; > import cascading.tap.hadoop.Hfs; > import cascading.tuple.Fields; > import parquet.cascading.ParquetTupleScheme; > > public class GenrateTimeStampParquetFile { > static String inputPath = "target/input/timestampInputFile"; > static String outputPath = > "target/parquetOutput/TimestampOutput"; > > public static void main(String[] args) { > > write(); > } > > private static void write() { > // TODO Auto-generated method stub > > Fields field = new > Fields("timestampField").applyTypes(String.class); > Scheme sourceSch = new TextDelimited(field, true, "\n"); > > Fields outputField = new Fields("timestampField"); > > Scheme sinkSch = new ParquetTupleScheme(field, outputField, > "message TimeStampTest{optional binary > timestampField ;}"); > > Tap source = new Hfs(sourceSch, inputPath); > Tap sink = new Hfs(sinkSch, outputPath, SinkMode.REPLACE); > > Pipe pipe = new Pipe("Hive timestamp"); > > FlowDef fd = FlowDef.flowDef().addSource(pipe, > source).addTailSink(pipe, sink); > > new HadoopFlowConnector().connect(fd).complete(); > } > } > > Input file: > > timestampInputFile > > timestampField > 1988-05-25 15:15:15.254 > 1987-05-06 14:14:25.362 > > After running the code following files are generated. > Output : > 1. part-00000-m-00000.parquet > 2. _SUCCESS > 3. _metadata > 4. _common_metadata > > I have created the table in hive to load the > part-00000-m-00000.parquet file. > File is loaded is successfully but it gives null value while reading. > > I have used following command. > > hive> create table timestampTest (timestampField timestamp); > > hive> load data local inpath > '/home/hduser/parquet_testing/part-00000-m-00000.parquet' into table > timestampTest; Loading data to table > parquet_timestamp_test.timestamptest > Table parquet_timestamp_test.timestamptest stats: [numFiles=1, > totalSize=296] OK Time taken: 0.508 seconds > > hive> select * from timestamptest; > OK > NULL > NULL > NULL > Time taken: 0.104 seconds, Fetched: 3 row(s) > > **************************************Disclaimer********************** > ******************** This e-mail message and any attachments may > contain confidential information and is for the sole use of the > intended recipient(s) only. Any views or opinions presented or implied > are solely those of the author and do not necessarily represent the > views of BitWise. If you are not the intended recipient(s), you are > hereby notified that disclosure, printing, copying, forwarding, > distribution, or the taking of any action whatsoever in reliance on > the contents of this electronic information is strictly prohibited. If > you have received this e-mail message in error, please immediately > notify the sender and delete the electronic message and any > attachments.BitWise does not accept liability for any virus introduced > by this e-mail or any attachments. > ********************************************************************** > ********************** >
