On Thu, Apr 15, 2010 at 3:00 PM, Arvind Prabhakar <[email protected]>wrote:
> Hi Sagar, > > Looks like your source file has custom writable types in it. If that is the > case, implementing a SerDe that works with that type may not be that > straight forward, although doable. > > An alternative would be to implement a custom RecordReader that converts > the value of your custom writable to Struct type which can then be queried > directly. > > Arvind > > > On Thu, Apr 15, 2010 at 1:06 AM, Sagar Naik <[email protected]> wrote: > >> Hi >> >> My data is in the value field of a sequence file. >> The value field has subfields in it. I am trying to create table using >> these subfields. >> Example: >> <KEY> <VALUE> >> <KEY_FIELD1, KEYFIELD 2> forms the key >> <VALUE_FIELD1, VALUE_FIELD2, VALUE_FIELD3>. >> So i am trying to create a table from VALUE_FIELD* >> >> CREATE EXTERNAL TABLE table_name (VALUE_FIELD1 as BIGINT, VALUE_FIELD2 as >> string, VALUE_FIELD3 as BIGINT ) STORED AS SEQUENCEFILE; >> >> I am planing to a write a custom SerDe implementation and custom >> SequenceFileReader >> Pl let me knw if I am on the right track. >> >> >> -Sagar > > > I am actually having lots of trouble with this. I have a sequence file that opens fine with /home/edward/hadoop/hadoop-0.20.2/bin/hadoop dfs -text /home/edward/Downloads/seq/seq create external table keyonly( ver string , theid int, thedate string ) row format delimited fields terminated by ',' STORED AS inputformat 'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat' outputformat 'org.apache.hadoop.hive.ql.io.HiveNullValueSequenceFileOutputFormat' location '/home/edward/Downloads/seq'; Also tried inputformat 'org.apache.hadoop.mapred.SequenceFileInputFormat' or stored as SEQUENCEFILE I always get this... 2010-04-15 13:10:43,849 ERROR CliDriver (SessionState.java:printError(255)) - Failed with exception java.io.IOException:java.io.EOFException java.io.IOException: java.io.EOFException at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:332) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:120) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:681) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:146) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:510) at org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_key_only(TestCliDriver.java:79) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:154) at junit.framework.TestCase.runBare(TestCase.java:127) at junit.framework.TestResult$1.protect(TestResult.java:106) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.framework.TestResult.run(TestResult.java:109) at junit.framework.TestCase.run(TestCase.java:118) at junit.framework.TestSuite.runTest(TestSuite.java:208) at junit.framework.TestSuite.run(TestSuite.java:203) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:422) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:931) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:785) Caused by: java.io.EOFException at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:207) at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:197) at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:136) at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58) at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68) at org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.<init>(GzipCodec.java:92) at org.apache.hadoop.io.compress.GzipCodec$GzipInputStream.<init>(GzipCodec.java:101) at org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:169) at org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:179) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1520) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412) at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43) at org.apache.hadoop.mapred.SequenceFileAsTextRecordReader.<init>(SequenceFileAsTextRecordReader.java:44) at org.apache.hadoop.mapred.SequenceFileAsTextInputFormat.getRecordReader(SequenceFileAsTextInputFormat.java:43) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:296) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:311) ... 21 more Does anyone have a clue on what I am doing wrong??
