[Hadoop Wiki] Update of "Hive/HBaseBulkLoad" by JohnSichi

Apache Wiki Fri, 04 Feb 2011 12:49:47 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "Hive/HBaseBulkLoad" page has been changed by JohnSichi.
http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad?action=diff&rev1=21&rev2=22

--------------------------------------------------

  In order to perform a parallel sort on the data, we need to range-partition 
it.  The idea is to divide the space of row keys up into nearly equal-sized 
ranges, one per reducer which will be used in the parallel sort.  The details 
will vary according to your source data, and you may need to run a number of 
exploratory Hive queries in order to come up with a good enough set of ranges.  
Here's one example:
  
  {{{
+ add jar lib/hive_contrib.jar;
  set mapred.reduce.tasks=1;
  create temporary function row_sequence as 
  'org.apache.hadoop.hive.contrib.udf.UDFRowSequence';

[Hadoop Wiki] Update of "Hive/HBaseBulkLoad" by JohnSichi

Reply via email to