Re: key part of sequence files

Zheng Shao Thu, 05 Nov 2009 12:52:07 -0800

Hi Bobby,

Can you open a jira and attach a patch?
We can put that to contrib.


Zheng


On 11/5/09, Bobby Rullo <[email protected]> wrote:
> Andrey,
>
> Here you go:
>
> http://pastebin.com/m5724ce8a
>
> Bobby
> On Nov 5, 2009, at 8:59 AM, Andrey Pankov wrote:
>
>> Thanks Bobby. Yeah, could be nice to take a look into your class, just
>> to get familiar with. Could you please post at pastebin.com ? Thanks a
>> lot!
>>
>> On Thu, Nov 5, 2009 at 18:56, Bobby Rullo <[email protected]> wrote:
>>> I had the exact same question, and Zheng told me I had to implement
>>> a new
>>> FileInputFormat, so I extended SequenceFileInputFormat, and it
>>> worked out
>>> pretty well.
>>>
>>> If you like, I can post the source code somewhere (here?), but it
>>> was pretty
>>> easy.
>>>
>>> Bobby
>>> On Nov 5, 2009, at 8:20 AM, Andrey Pankov wrote:
>>>
>>>> Hi guys,
>>>>
>>>> We have a lot of data stored inside compressed SEQ files. Since
>>>> SEQ is
>>>> a sequence of (key,value) pairs we are storing set of columns joined
>>>> by tab in key part of SEQ, and the same for value part for another
>>>> set
>>>> of columns. So our SEQ files are of type (Text,Text).
>>>> Hive cannot understand such files correctly, i.e. I'm not
>>>> satisfied by
>>>> its defaults. What it does - it ignores key part of SEQ, and value
>>>> part can deserialize into set of columns successfully.
>>>> Can some please point me how to get Hive not ignore SEQ's key?
>>>> Thanks.
>>>>
>>>> --
>>>> Andrey Pankov
>>>
>>>
>>
>>
>>
>> --
>> Andrey Pankov
>
>

-- 
Sent from Gmail for mobile | mobile.google.com

Yours,
Zheng

Re: key part of sequence files

Reply via email to