[Hadoop Wiki] Update of "Hbase/MapReduce" by stack

Apache Wiki Wed, 06 Feb 2008 17:00:09 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The following page has been changed by stack:
http://wiki.apache.org/hadoop/Hbase/MapReduce

The comment on the change is:
Simplify

------------------------------------------------------------------------------
  = Hbase, MapReduce and the CLASSPATH =
  
- !MapReduce jobs deployed to a mapreduce cluster do not usually have access to 
the configuration under ''$HBASE_CONF_DIR'' nor to hbase classes.
+ !MapReduce jobs deployed to a mapreduce cluster do not by default have access 
to the hbase configuration under ''$HBASE_CONF_DIR'' nor to hbase classes.
  
- Any hbase particular configuration not hard-coded into the job jar classes -- 
e.g. the address of the target hbase master -- that is needed by running maps 
and/or reduces needs to be either included explicitly in the job jar, by 
jarring an appropriately configured ''hbase-site.xml'' into a conf 
subdirectory, or by adding an ''hbase-site.xml'' under ''$HADOOP_HOME/conf'' 
and copying it across the mapreduce cluster.  The same holds true for any hbase 
classes referenced by the mapreduce job jar.  By default the hbase classes are 
not available on the general mapreduce ''CLASSPATH''.  To add them, you have a 
couple of options. Either include the hadoop-X.X.X-hbase.jar in the job jar 
under the lib subdirectory or copy the hadoop-X.X.X-hbase.jar to 
$HADOOP_HOME/lib and copy it across the cluster.
- 
- But the cleanest means of adding hbase configuration and classes to the 
cluster CLASSPATH is by uncommenting ''HADOOP_CLASSPATH'' in 
''$HADOOP_HOME/conf/hadoop-env.sh'' and adding the path to the hbase jar and 
''conf'' directory.  Then copy the amended configuration across the cluster.  
You'll need to restart the mapreduce cluster if you want it to notice the new 
configuration.
+ You could add ''hbase-site.xml'' to $HADOOP_HOME/conf and add hbase.jar to 
the $HADOOP_HOME/lib and copy these changes across your cluster but he cleanest 
means of adding hbase configuration and classes to the cluster CLASSPATH is by 
uncommenting ''HADOOP_CLASSPATH'' in ''$HADOOP_HOME/conf/hadoop-env.sh'' and 
adding the path to the hbase jar and ''$HBASE_CONF_DIR'' directory.  Then copy 
the amended configuration across the cluster.  You'll need to restart the 
mapreduce cluster if you want it to notice the new configuration.
  
  For example, here is how you would amend ''hadoop-env.sh'' adding hbase 
classes and the !PerformanceEvaluation class from hbase test classes to the 
hadoop ''CLASSPATH'':
  
@@ -14, +12 @@

  # export HADOOP_CLASSPATH=
  export 
HADOOP_CLASSPATH=$HBASE_HOME/build/test:$HBASE_HOME/build/hadoop-0.15.0-dev-hbase.jar}}}
  
- (Expand $HBASE_HOME appropriately in the in accordance with your local 
environment)
+ Expand $HBASE_HOME appropriately in the in accordance with your local 
environment
  
  And then, this is how you would run the PerformanceEvaluation MR job to put 
up 4 clients:
  
- {{{ > $HADOOP_HOME/bin/hadoop org.apache.hadoop.hbase.PerformanceEvaluation 
sequentialWrite 4 }}}
+ {{{ > $HADOOP_HOME/bin/hadoop org.apache.hadoop.hbase.PerformanceEvaluation 
sequentialWrite 4
+ }}}
  
- (The PerformanceEvaluation class wil be found on the CLASSPATH because you 
added $HBASE_HOME/build/test to HADOOP_CLASSPATH)
+ The PerformanceEvaluation class wil be found on the CLASSPATH because you 
added $HBASE_HOME/build/test to HADOOP_CLASSPATH
  
  = Hbase as MapReduce job data source and sink =

[Hadoop Wiki] Update of "Hbase/MapReduce" by stack

Reply via email to