Re: insert overwrite table - include key in seq file output

John Sichi Fri, 09 Jul 2010 09:44:45 -0700

You can create a table with a custom outputformat which puts the rows into 
whatever format your other job wants.  See for example table hb_range_keys in 
this doc:

http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad#Prepare_Range_Partitioning

I added a HiveNullValueSequenceFileOutputFormat to get it to write out in the 
format needed downstream by the TotalOrderPartitioner (which wanted everything 
in the key and null in the value).  You can load in your own extension classes 
to do similar things.

JVS

On Jul 9, 2010, at 9:38 AM, Matt Pestritto wrote:

> Hi.
> 
> Something I noticed is that when I run an insert overwrite table... for 
> sequence files the key is empty.
> This works as expected for further hive queries because as I understand, hive 
> only reads the value for hive based queries.
> 
> I have another MR job outside of hive that needs a key specified and want to 
> consume this same data.
> 
> My question is, can I run an insert overwrite table statement and specify a 
> specific column to use as the key instead of an empty int writable in the 
> output seq file ? 
> 
> Thanks in advance.
> -Matt

Re: insert overwrite table - include key in seq file output

Reply via email to