On Thu, Apr 15, 2010 at 7:23 PM, Arvind Prabhakar <[email protected]>wrote:
> On Thu, Apr 15, 2010 at 1:23 PM, Edward Capriolo <[email protected]>wrote: > >> >> >> On Thu, Apr 15, 2010 at 3:00 PM, Arvind Prabhakar <[email protected]>wrote: >> >>> Hi Sagar, >>> >>> Looks like your source file has custom writable types in it. If that is >>> the case, implementing a SerDe that works with that type may not be that >>> straight forward, although doable. >>> >>> An alternative would be to implement a custom RecordReader that converts >>> the value of your custom writable to Struct type which can then be queried >>> directly. >>> >>> Arvind >>> >>> >>> On Thu, Apr 15, 2010 at 1:06 AM, Sagar Naik <[email protected]>wrote: >>> >>>> Hi >>>> >>>> My data is in the value field of a sequence file. >>>> The value field has subfields in it. I am trying to create table using >>>> these subfields. >>>> Example: >>>> <KEY> <VALUE> >>>> <KEY_FIELD1, KEYFIELD 2> forms the key >>>> <VALUE_FIELD1, VALUE_FIELD2, VALUE_FIELD3>. >>>> So i am trying to create a table from VALUE_FIELD* >>>> >>>> CREATE EXTERNAL TABLE table_name (VALUE_FIELD1 as BIGINT, VALUE_FIELD2 >>>> as string, VALUE_FIELD3 as BIGINT ) STORED AS SEQUENCEFILE; >>>> >>>> I am planing to a write a custom SerDe implementation and custom >>>> SequenceFileReader >>>> Pl let me knw if I am on the right track. >>>> >>>> >>>> -Sagar >>> >>> >>> >> I am actually having lots of trouble with this. >> I have a sequence file that opens fine with >> /home/edward/hadoop/hadoop-0.20.2/bin/hadoop dfs -text >> /home/edward/Downloads/seq/seq >> >> create external table keyonly( ver string , theid int, thedate string ) >> row format delimited fields terminated by ',' >> STORED AS >> inputformat 'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat' >> outputformat >> 'org.apache.hadoop.hive.ql.io.HiveNullValueSequenceFileOutputFormat' >> >> location '/home/edward/Downloads/seq'; >> >> >> >> Also tried >> inputformat 'org.apache.hadoop.mapred.SequenceFileInputFormat' >> or stored as SEQUENCEFILE >> >> I always get this... >> >> 2010-04-15 13:10:43,849 ERROR CliDriver >> (SessionState.java:printError(255)) - Failed with exception >> java.io.IOException:java.io.EOFException >> java.io.IOException: java.io.EOFException >> at >> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:332) >> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:120) >> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:681) >> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:146) >> at >> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197) >> at >> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:510) >> at >> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_key_only(TestCliDriver.java:79) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at junit.framework.TestCase.runTest(TestCase.java:154) >> at junit.framework.TestCase.runBare(TestCase.java:127) >> at junit.framework.TestResult$1.protect(TestResult.java:106) >> at junit.framework.TestResult.runProtected(TestResult.java:124) >> at junit.framework.TestResult.run(TestResult.java:109) >> at junit.framework.TestCase.run(TestCase.java:118) >> at junit.framework.TestSuite.runTest(TestSuite.java:208) >> at junit.framework.TestSuite.run(TestSuite.java:203) >> at >> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:422) >> at >> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:931) >> at >> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:785) >> Caused by: java.io.EOFException >> at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:207) >> at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:197) >> at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:136) >> at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58) >> at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68) >> at >> org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.<init>(GzipCodec.java:92) >> at >> org.apache.hadoop.io.compress.GzipCodec$GzipInputStream.<init>(GzipCodec.java:101) >> at >> org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:169) >> at >> org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:179) >> at >> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1520) >> at >> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428) >> at >> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417) >> at >> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412) >> at >> org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43) >> at >> org.apache.hadoop.mapred.SequenceFileAsTextRecordReader.<init>(SequenceFileAsTextRecordReader.java:44) >> at >> org.apache.hadoop.mapred.SequenceFileAsTextInputFormat.getRecordReader(SequenceFileAsTextInputFormat.java:43) >> at >> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:296) >> at >> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:311) >> ... 21 more >> >> Does anyone have a clue on what I am doing wrong?? >> >> > The SequenceFileAsTextInputFormat converts the sequence record values to > string using the toString() invocation. Assuming that your data has a custom > writable that has multiple fields in it, I don't think it is possible for > you to map the individual bits to different columns. > > Can you try doing the following: > > create external table dummy( fullvalue string) > stored as inputformat > 'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat' > outputformat'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > > location '/home/edward/Downloads/seq'; > > and then doing a select * from dummy. > > Arvind > [edw...@ec hive]$ head -1 /home/edward/Downloads/seq/seq | od -a 0000000 S E Q ack em o r g . a p a c h e . 0000020 h a d o o p . i o . T e x t em o 0000040 r g . a p a c h e . h a d o o p 0000060 . i o . T e x t soh soh ' o r g . a 0000100 p a c h e . h a d o o p . i o . 0000120 c o m p r e s s . G z i p C o d 0000140 e c nul nul nul nul = 4 ff Y F s V so 4 " 0000160 R + X enq dle T del del del del = 4 ff Y F s 0000200 V so 4 " R + X enq dle T soh etb us vt bs nul 2010-04-15 18:45:24,954 ERROR CliDriver (SessionState.java:printError(255)) - Failed with exception java.io.IOException:java.io.EOFException java.io.IOException: java.io.EOFException at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:332) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:120) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:681) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:146) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:510) at org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_key_only(TestCliDriver.java:79) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:154) at junit.framework.TestCase.runBare(TestCase.java:127) at junit.framework.TestResult$1.protect(TestResult.java:106) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.framework.TestResult.run(TestResult.java:109) at junit.framework.TestCase.run(TestCase.java:118) at junit.framework.TestSuite.runTest(TestSuite.java:208) at junit.framework.TestSuite.run(TestSuite.java:203) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:422) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:931) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:785) Caused by: java.io.EOFException at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:207) at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:197) at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:136) at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58) at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68) at org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.<init>(GzipCodec.java:92) at org.apache.hadoop.io.compress.GzipCodec$GzipInputStream.<init>(GzipCodec.java:101) at org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:169) at org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:179) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1520) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412) at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43) at org.apache.hadoop.mapred.SequenceFileAsTextRecordReader.<init>(SequenceFileAsTextRecordReader.java:44) at org.apache.hadoop.mapred.SequenceFileAsTextInputFormat.getRecordReader(SequenceFileAsTextInputFormat.java:43) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:296) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:311) ... 21 more
