On Thu, Apr 15, 2010 at 1:23 PM, Edward Capriolo <[email protected]>wrote:
> > > On Thu, Apr 15, 2010 at 3:00 PM, Arvind Prabhakar <[email protected]>wrote: > >> Hi Sagar, >> >> Looks like your source file has custom writable types in it. If that is >> the case, implementing a SerDe that works with that type may not be that >> straight forward, although doable. >> >> An alternative would be to implement a custom RecordReader that converts >> the value of your custom writable to Struct type which can then be queried >> directly. >> >> Arvind >> >> >> On Thu, Apr 15, 2010 at 1:06 AM, Sagar Naik <[email protected]> wrote: >> >>> Hi >>> >>> My data is in the value field of a sequence file. >>> The value field has subfields in it. I am trying to create table using >>> these subfields. >>> Example: >>> <KEY> <VALUE> >>> <KEY_FIELD1, KEYFIELD 2> forms the key >>> <VALUE_FIELD1, VALUE_FIELD2, VALUE_FIELD3>. >>> So i am trying to create a table from VALUE_FIELD* >>> >>> CREATE EXTERNAL TABLE table_name (VALUE_FIELD1 as BIGINT, VALUE_FIELD2 as >>> string, VALUE_FIELD3 as BIGINT ) STORED AS SEQUENCEFILE; >>> >>> I am planing to a write a custom SerDe implementation and custom >>> SequenceFileReader >>> Pl let me knw if I am on the right track. >>> >>> >>> -Sagar >> >> >> > I am actually having lots of trouble with this. > I have a sequence file that opens fine with > /home/edward/hadoop/hadoop-0.20.2/bin/hadoop dfs -text > /home/edward/Downloads/seq/seq > > create external table keyonly( ver string , theid int, thedate string ) > row format delimited fields terminated by ',' > STORED AS > inputformat 'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat' > outputformat > 'org.apache.hadoop.hive.ql.io.HiveNullValueSequenceFileOutputFormat' > > location '/home/edward/Downloads/seq'; > > > > Also tried > inputformat 'org.apache.hadoop.mapred.SequenceFileInputFormat' > or stored as SEQUENCEFILE > > I always get this... > > 2010-04-15 13:10:43,849 ERROR CliDriver (SessionState.java:printError(255)) > - Failed with exception java.io.IOException:java.io.EOFException > java.io.IOException: java.io.EOFException > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:332) > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:120) > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:681) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:146) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:510) > at > org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_key_only(TestCliDriver.java:79) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at junit.framework.TestCase.runTest(TestCase.java:154) > at junit.framework.TestCase.runBare(TestCase.java:127) > at junit.framework.TestResult$1.protect(TestResult.java:106) > at junit.framework.TestResult.runProtected(TestResult.java:124) > at junit.framework.TestResult.run(TestResult.java:109) > at junit.framework.TestCase.run(TestCase.java:118) > at junit.framework.TestSuite.runTest(TestSuite.java:208) > at junit.framework.TestSuite.run(TestSuite.java:203) > at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:422) > at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:931) > at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:785) > Caused by: java.io.EOFException > at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:207) > at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:197) > at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:136) > at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58) > at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68) > at > org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.<init>(GzipCodec.java:92) > at > org.apache.hadoop.io.compress.GzipCodec$GzipInputStream.<init>(GzipCodec.java:101) > at > org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:169) > at > org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:179) > at > org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1520) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412) > at > org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43) > at > org.apache.hadoop.mapred.SequenceFileAsTextRecordReader.<init>(SequenceFileAsTextRecordReader.java:44) > at > org.apache.hadoop.mapred.SequenceFileAsTextInputFormat.getRecordReader(SequenceFileAsTextInputFormat.java:43) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:296) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:311) > ... 21 more > > Does anyone have a clue on what I am doing wrong?? > > The SequenceFileAsTextInputFormat converts the sequence record values to string using the toString() invocation. Assuming that your data has a custom writable that has multiple fields in it, I don't think it is possible for you to map the individual bits to different columns. Can you try doing the following: create external table dummy( fullvalue string) stored as inputformat 'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat' outputformat'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' location '/home/edward/Downloads/seq'; and then doing a select * from dummy. Arvind
