Re: table from sequence file

Edward Capriolo Thu, 15 Apr 2010 13:24:21 -0700

On Thu, Apr 15, 2010 at 3:00 PM, Arvind Prabhakar <[email protected]>wrote:


> Hi Sagar,
>
> Looks like your source file has custom writable types in it. If that is the
> case, implementing a SerDe that works with that type may not be that
> straight forward, although doable.
>
> An alternative would be to implement a custom RecordReader that converts
> the value of your custom writable to Struct type which can then be queried
> directly.
>
> Arvind
>
>
> On Thu, Apr 15, 2010 at 1:06 AM, Sagar Naik <[email protected]> wrote:
>
>> Hi
>>
>> My data is in the value field of a sequence file.
>> The value field has subfields in it. I am trying to create table using
>> these subfields.
>> Example:
>> <KEY> <VALUE>
>> <KEY_FIELD1, KEYFIELD 2>  forms the key
>> <VALUE_FIELD1, VALUE_FIELD2, VALUE_FIELD3>.
>> So i am trying to create a table from VALUE_FIELD*
>>
>> CREATE EXTERNAL TABLE table_name (VALUE_FIELD1 as BIGINT, VALUE_FIELD2 as
>> string, VALUE_FIELD3 as BIGINT ) STORED AS SEQUENCEFILE;
>>
>> I am planing to a write a custom SerDe implementation and custom
>> SequenceFileReader
>> Pl let me knw if I am on the right track.
>>
>>
>> -Sagar
>
>
>
I am actually having lots of trouble with this.
I have a sequence file that opens fine with
/home/edward/hadoop/hadoop-0.20.2/bin/hadoop dfs -text
/home/edward/Downloads/seq/seq

create external table keyonly( ver string , theid int, thedate string )
row format delimited fields terminated by ','
STORED AS
inputformat 'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat'
outputformat
'org.apache.hadoop.hive.ql.io.HiveNullValueSequenceFileOutputFormat'

location '/home/edward/Downloads/seq';



Also tried
inputformat 'org.apache.hadoop.mapred.SequenceFileInputFormat'
or stored as SEQUENCEFILE

I always get this...

2010-04-15 13:10:43,849 ERROR CliDriver (SessionState.java:printError(255))
- Failed with exception java.io.IOException:java.io.EOFException
java.io.IOException: java.io.EOFException
    at
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:332)
    at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:120)
    at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:681)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:146)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
    at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:510)
    at
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_key_only(TestCliDriver.java:79)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at junit.framework.TestCase.runTest(TestCase.java:154)
    at junit.framework.TestCase.runBare(TestCase.java:127)
    at junit.framework.TestResult$1.protect(TestResult.java:106)
    at junit.framework.TestResult.runProtected(TestResult.java:124)
    at junit.framework.TestResult.run(TestResult.java:109)
    at junit.framework.TestCase.run(TestCase.java:118)
    at junit.framework.TestSuite.runTest(TestSuite.java:208)
    at junit.framework.TestSuite.run(TestSuite.java:203)
    at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:422)
    at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:931)
    at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:785)
Caused by: java.io.EOFException
    at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:207)
    at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:197)
    at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:136)
    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58)
    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68)
    at
org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.<init>(GzipCodec.java:92)
    at
org.apache.hadoop.io.compress.GzipCodec$GzipInputStream.<init>(GzipCodec.java:101)
    at
org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:169)
    at
org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:179)
    at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1520)
    at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
    at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
    at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
    at
org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43)
    at
org.apache.hadoop.mapred.SequenceFileAsTextRecordReader.<init>(SequenceFileAsTextRecordReader.java:44)
    at
org.apache.hadoop.mapred.SequenceFileAsTextInputFormat.getRecordReader(SequenceFileAsTextInputFormat.java:43)
    at
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:296)
    at
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:311)
    ... 21 more

Does anyone have a clue on what I am doing wrong??

Re: table from sequence file

Reply via email to