Re: table from sequence file

Arvind Prabhakar Thu, 15 Apr 2010 16:23:32 -0700

On Thu, Apr 15, 2010 at 1:23 PM, Edward Capriolo <[email protected]>wrote:


>
>
> On Thu, Apr 15, 2010 at 3:00 PM, Arvind Prabhakar <[email protected]>wrote:
>
>> Hi Sagar,
>>
>> Looks like your source file has custom writable types in it. If that is
>> the case, implementing a SerDe that works with that type may not be that
>> straight forward, although doable.
>>
>> An alternative would be to implement a custom RecordReader that converts
>> the value of your custom writable to Struct type which can then be queried
>> directly.
>>
>> Arvind
>>
>>
>> On Thu, Apr 15, 2010 at 1:06 AM, Sagar Naik <[email protected]> wrote:
>>
>>> Hi
>>>
>>> My data is in the value field of a sequence file.
>>> The value field has subfields in it. I am trying to create table using
>>> these subfields.
>>> Example:
>>> <KEY> <VALUE>
>>> <KEY_FIELD1, KEYFIELD 2>  forms the key
>>> <VALUE_FIELD1, VALUE_FIELD2, VALUE_FIELD3>.
>>> So i am trying to create a table from VALUE_FIELD*
>>>
>>> CREATE EXTERNAL TABLE table_name (VALUE_FIELD1 as BIGINT, VALUE_FIELD2 as
>>> string, VALUE_FIELD3 as BIGINT ) STORED AS SEQUENCEFILE;
>>>
>>> I am planing to a write a custom SerDe implementation and custom
>>> SequenceFileReader
>>> Pl let me knw if I am on the right track.
>>>
>>>
>>> -Sagar
>>
>>
>>
> I am actually having lots of trouble with this.
> I have a sequence file that opens fine with
> /home/edward/hadoop/hadoop-0.20.2/bin/hadoop dfs -text
> /home/edward/Downloads/seq/seq
>
> create external table keyonly( ver string , theid int, thedate string )
> row format delimited fields terminated by ','
> STORED AS
> inputformat 'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat'
> outputformat
> 'org.apache.hadoop.hive.ql.io.HiveNullValueSequenceFileOutputFormat'
>
> location '/home/edward/Downloads/seq';
>
>
>
> Also tried
> inputformat 'org.apache.hadoop.mapred.SequenceFileInputFormat'
> or stored as SEQUENCEFILE
>
> I always get this...
>
> 2010-04-15 13:10:43,849 ERROR CliDriver (SessionState.java:printError(255))
> - Failed with exception java.io.IOException:java.io.EOFException
> java.io.IOException: java.io.EOFException
>     at
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:332)
>     at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:120)
>     at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:681)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:146)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
>     at
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:510)
>     at
> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_key_only(TestCliDriver.java:79)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at junit.framework.TestCase.runTest(TestCase.java:154)
>     at junit.framework.TestCase.runBare(TestCase.java:127)
>     at junit.framework.TestResult$1.protect(TestResult.java:106)
>     at junit.framework.TestResult.runProtected(TestResult.java:124)
>     at junit.framework.TestResult.run(TestResult.java:109)
>     at junit.framework.TestCase.run(TestCase.java:118)
>     at junit.framework.TestSuite.runTest(TestSuite.java:208)
>     at junit.framework.TestSuite.run(TestSuite.java:203)
>     at
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:422)
>     at
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:931)
>     at
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:785)
> Caused by: java.io.EOFException
>     at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:207)
>     at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:197)
>     at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:136)
>     at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58)
>     at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68)
>     at
> org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.<init>(GzipCodec.java:92)
>     at
> org.apache.hadoop.io.compress.GzipCodec$GzipInputStream.<init>(GzipCodec.java:101)
>     at
> org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:169)
>     at
> org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:179)
>     at
> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1520)
>     at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
>     at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
>     at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
>     at
> org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43)
>     at
> org.apache.hadoop.mapred.SequenceFileAsTextRecordReader.<init>(SequenceFileAsTextRecordReader.java:44)
>     at
> org.apache.hadoop.mapred.SequenceFileAsTextInputFormat.getRecordReader(SequenceFileAsTextInputFormat.java:43)
>     at
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:296)
>     at
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:311)
>     ... 21 more
>
> Does anyone have a clue on what I am doing wrong??
>
>
The SequenceFileAsTextInputFormat converts the sequence record values to
string using the toString() invocation. Assuming that your data has a custom
writable that has multiple fields in it, I don't think it is possible for
you to map the individual bits to different columns.

Can you try doing the following:

create external table dummy( fullvalue string)
stored as inputformat
'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat'
outputformat'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
location '/home/edward/Downloads/seq';

and then doing a select * from dummy.

Arvind

Re: table from sequence file

Reply via email to