Andrey,
Here you go:
http://pastebin.com/m5724ce8a
Bobby
On Nov 5, 2009, at 8:59 AM, Andrey Pankov wrote:
Thanks Bobby. Yeah, could be nice to take a look into your class, just
to get familiar with. Could you please post at pastebin.com ? Thanks a
lot!
On Thu, Nov 5, 2009 at 18:56, Bobby Rullo <[email protected]> wrote:
I had the exact same question, and Zheng told me I had to implement
a new
FileInputFormat, so I extended SequenceFileInputFormat, and it
worked out
pretty well.
If you like, I can post the source code somewhere (here?), but it
was pretty
easy.
Bobby
On Nov 5, 2009, at 8:20 AM, Andrey Pankov wrote:
Hi guys,
We have a lot of data stored inside compressed SEQ files. Since
SEQ is
a sequence of (key,value) pairs we are storing set of columns joined
by tab in key part of SEQ, and the same for value part for another
set
of columns. So our SEQ files are of type (Text,Text).
Hive cannot understand such files correctly, i.e. I'm not
satisfied by
its defaults. What it does - it ignores key part of SEQ, and value
part can deserialize into set of columns successfully.
Can some please point me how to get Hive not ignore SEQ's key?
Thanks.
--
Andrey Pankov
--
Andrey Pankov