Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The "Hive/HBaseBulkLoad" page has been changed by JohnSichi.
http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad?action=diff&rev1=10&rev2=11

--------------------------------------------------

  set 
hive.mapred.partitioner=org.apache.hadoop.mapred.lib.TotalOrderPartitioner;
  set total.order.partitioner.natural.order=false;
  set total.order.partitioner.path=/tmp/hb_range_key_list;
+ set hfile.compression='gz';
  
  create table hbsort(transaction_id string, user_name string, amount double, 
...)
  stored as
@@ -128, +129 @@

  cluster by transaction_id;
  }}}
  
- The CREATE TABLE creates a dummy table which controls how the output of the 
sort is written.  Note that it uses {{{HiveHFileOutputFormat}}} to do this, 
with the table property {{{hfile.family.path}}} used to control the destination 
directory for the output.  Again, be sure to set the inputformat/outputformat 
exactly as specified.
+ The CREATE TABLE creates a dummy table which controls how the output of the 
sort is written.  Note that it uses {{{HiveHFileOutputFormat}}} to do this, 
with the table property {{{hfile.family.path}}} used to control the destination 
directory for the output.  Again, be sure to set the inputformat/outputformat 
exactly as specified.  In the example above, we select gzip ('gz') compression 
for the result files; if you don't set the {{{hfile.compression}}} parameter, 
no compression will be performed.  (The other method available is 'lzo', which 
compresses less aggressively but does not require as much CPU power.)
  
  The {{{cf}}} in the path specifies the name of the column family which will 
be created in HBase, so the directory name you choose here is important.  (Note 
that we're not actually using an HBase table here; {{{HiveHFileOutputFormat}}} 
writes directly to files.)
  

Reply via email to