Actually, the HBase documentation discourages physically copying JARs from the HBase classpath to the Hadoop one:
>From the HBase API documentation ( http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/mapreduce/package-summary.html ): "HBase, MapReduce and the CLASSPATH MapReduce jobs deployed to a MapReduce cluster do not by default have access to the HBase configuration under $HBASE_CONF_DIR nor to HBase classes. You could add hbase-site.xml to $HADOOP_HOME/conf and add hbase jars to the $HADOOP_HOME/lib and copy these changes across your cluster *but a cleaner means of adding hbase configuration and classes to the cluster CLASSPATH is by uncommenting HADOOP_CLASSPATH in $HADOOP_HOME/conf/hadoop-env.sh adding hbase dependencies here.*" It seems that the approach in bold is not sufficient and that not all mapred jobs have access to the required JARs unless the first approach is taken. -GS On Sat, Apr 10, 2010 at 1:35 PM, Edward Capriolo <[email protected]>wrote: > On Sat, Apr 10, 2010 at 1:31 PM, George Stathis <[email protected]> > wrote: > > > Ted, > > > > HADOOP-6695 is an improvement request and a different issue from what I > am > > encountering. What I am referring to is not a dynamic classloading issue. > > It > > happens even after the servers are being restarted. You are requesting > for > > Hadoop to automatically detect new JARs without restarting when they are > > placed in its' classpath. I'm saying that my MapRed jobs fail unless some > > JARs are physically present in the hadoop lib directory, regardless of > > server restarts and HADOOP_CLASSPATH settings. > > > > I hope this clarifies things. > > > > -GS > > > > On Sat, Apr 10, 2010 at 1:11 PM, <[email protected]> wrote: > > > > > I logged HADOOP-6695 > > > > > > Cheers > > > Sent from my Verizon Wireless BlackBerry > > > > > > -----Original Message----- > > > From: George Stathis <[email protected]> > > > Date: Sat, 10 Apr 2010 12:11:37 > > > To: <[email protected]> > > > Subject: Re: org.apache.hadoop.hbase.mapreduce.Export fails with an NPE > > > > > > OK, the issue remains in our Ubuntu EC2 dev environment, so it's not > just > > > my > > > local setup. Here are some more observations based on some tests I just > > > ran: > > > > > > - If the zookeeper JAR is omitted from HADOOP_CLASSPATH, then there > are > > > ClassNotFoundExceptions thrown as would be expected > > > - If the zookeeper JAR is included in HADOOP_CLASSPATH, > > > the ClassNotFoundExceptions go away, but then the original NPE > > > re-appears: > > > > > > org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.restart(TableInputFormatBase.java:110) > > > - If the zookeeper JAR in physically included in $HADOOP_HOME/lib, > then > > > the NPE goes away as well > > > > > > So, while it seems that the HADOOP_CLASSPATH is indeed being read, > > > something > > > is missing during the MapRed process that keeps the htable from being > > > instantiated properly in TableInputFormatBase unless some JARs are > > > physically present in $HADOOP_HOME/lib. Note that this issue is not > > > specific > > > to the zookeeper JAR either. We have enabled the transactional contrib > > > indexed tables and we have the same problem if we don't physically > > > include hbase-transactional-0.20.3.jar in the hadoop lib even though > it's > > > included in HADOOP_CLASSPATH. > > > > > > It feels like there is a discrepancy in the way classloading is done > > > between > > > the various components. But I'm not sure whether this is even an HBase > > > issue > > > and not a Hadoop one. Seems like this might be a JIRA ticket candidate. > > Any > > > thoughts on which project should look at this first? > > > > > > -GS > > > > > > On Fri, Apr 9, 2010 at 8:29 PM, George Stathis <[email protected]> > > wrote: > > > > > > > Here is mine: > > > > > > > > export > > > > > > > > > > HADOOP_CLASSPATH="$HBASE_HOME/hbase-0.20.3.jar:$HBASE_HOME/hbase-0.20.3-test.jar:$HBASE_HOME/lib/zookeeper-3.2.2.jar:$HBASE_HOME/conf" > > > > > > > > $HBASE_HOME is defined in my .bash_profile, so it's already there and > I > > > see > > > > it expanded in the debug statements with the correct path. I even > tried > > > > hard-coding the $HBASE_HOME path above just in case and I had the > same > > > > issue. > > > > > > > > I any case, I'm passed it now. I'll have to check whether the same > > issue > > > > happens on our dev environment running on Ubuntu on EC2. If not, then > > at > > > > least it's localized to my OSX environment. > > > > > > > > -GS > > > > > > > > > > > > On Fri, Apr 9, 2010 at 7:32 PM, Stack <[email protected]> wrote: > > > > > > > >> Very odd. I don't have to do that running MR jobs. I wonder whats > > > >> different? (I'm using 0.20.4 near-candidate rather than 0.20.3, > > > >> 1.6.0u14). I have a HADOOP_ENV like this. > > > >> > > > >> export HBASE_HOME=/home/hadoop/0.20 > > > >> export HBASE_VERSION=20.4-dev > > > >> #export > > > >> > > > > > > HADOOP_CLASSPATH="$HBASE_HOME/conf:$HBASE_HOME/build/hbase-0.20.4-dev.jar:$HBASE_HOME/build/hbase-0.20.4-dev-test.jar:$HBASE_HOME/lib/zookeeper-3.2.2.jar" > > > >> export > > > >> > > > > > > HADOOP_CLASSPATH="$HBASE_HOME/conf:$HBASE_HOME/build/hbase-0.${HBASE_VERSION}.jar:$HBASE_HOME/build/hbase-0.${HBASE_VERSION}-test.jar:$HBASE_HOME/lib/zookeeper-3.2.2.jar" > > > >> > > > >> St.Ack > > > >> > > > >> On Fri, Apr 9, 2010 at 4:19 PM, George Stathis <[email protected]> > > > >> wrote: > > > >> > Solved: for those interested, I had to explicitly copy > > > >> zookeeper-3.2.2.jar > > > >> > to $HADOOP_HOME/lib even though I had added its' path to > > > >> $HADOOP_CLASSPATH > > > >> > under $HADOOP_HOME/conf/hadoop-env.sh. > > > >> > > > > >> > It makes no sense to me why that particular JAR would not get > picked > > > up. > > > >> It > > > >> > was even listed in the classpath debug output when I ran the job > > using > > > >> the > > > >> > hadoop shell script. If anyone can enlighten, please do. > > > >> > > > > >> > -GS > > > >> > > > > >> > On Fri, Apr 9, 2010 at 5:56 PM, George Stathis < > [email protected]> > > > >> wrote: > > > >> > > > > >> >> No dice. Classpath is now set. Same error. Meanwhile, I'm running > > "$ > > > >> hadoop > > > >> >> org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1" > > just > > > >> fine, > > > >> >> so MapRed is working at least. > > > >> >> > > > >> >> Still looking for suggestions then I guess. > > > >> >> > > > >> >> -GS > > > >> >> > > > >> >> > > > >> >> On Fri, Apr 9, 2010 at 5:31 PM, George Stathis < > [email protected] > > > > > > >> wrote: > > > >> >> > > > >> >>> RTFMing > > > >> >>> > > > >> > > > > > > http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/mapreduce/package-summary.htmlright > > > >> >>> now...Hadoop classpath not being set properly could be the > > issue... > > > >> >>> > > > >> >>> > > > >> >>> On Fri, Apr 9, 2010 at 5:26 PM, George Stathis < > > [email protected] > > > >> >wrote: > > > >> >>> > > > >> >>>> Hi folks, > > > >> >>>> > > > >> >>>> I hope this is just a newbie problem. > > > >> >>>> > > > >> >>>> Context: > > > >> >>>> - Running 0.20.3 tag locally in pseudo cluster mode > > > >> >>>> - $HBASE_HOME is in env and $PATH > > > >> >>>> - Running org.apache.hadoop.hbase.mapreduce.Export in the shell > > > such > > > >> >>>> as: $ hbase org.apache.hadoop.hbase.mapreduce.Export channels > > > >> >>>> /bkps/channels/01 > > > >> >>>> > > > >> >>>> Symptom: > > > >> >>>> - Getting an NPE at > > > >> >>>> > > > >> > > > > > > org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.restart(TableInputFormatBase.java:110): > > > >> >>>> > > > >> >>>> [...] > > > >> >>>> 110 this.scanner = this.htable.getScanner(newScan); > > > >> >>>> [...] > > > >> >>>> > > > >> >>>> Full output is bellow. Not sure why htable is still null at > that > > > >> point. > > > >> >>>> User error? > > > >> >>>> > > > >> >>>> Any help is appreciated. > > > >> >>>> > > > >> >>>> -GS > > > >> >>>> > > > >> >>>> Full output: > > > >> >>>> > > > >> >>>> $ hbase org.apache.hadoop.hbase.mapreduce.Export channels > > > >> >>>> /bkps/channels/01 > > > >> >>>> 2010-04-09 17:13:57.407::INFO: Logging to STDERR via > > > >> >>>> org.mortbay.log.StdErrLog > > > >> >>>> 2010-04-09 17:13:57.408::INFO: verisons=1, starttime=0, > > > >> >>>> endtime=9223372036854775807 > > > >> >>>> 10/04/09 17:13:58 DEBUG zookeeper.ZooKeeperWrapper: Read ZNode > > > >> >>>> /hbase/root-region-server got 192.168.1.16:52159 > > > >> >>>> 10/04/09 17:13:58 DEBUG client.HConnectionManager$TableServers: > > > Found > > > >> >>>> ROOT at 192.168.1.16:52159 > > > >> >>>> 10/04/09 17:13:58 DEBUG client.HConnectionManager$TableServers: > > > >> Cached > > > >> >>>> location for .META.,,1 is 192.168.1.16:52159 > > > >> >>>> 10/04/09 17:13:58 DEBUG client.HConnectionManager$TableServers: > > > >> Cached > > > >> >>>> location for channels,,1270753106916 is 192.168.1.16:52159 > > > >> >>>> 10/04/09 17:13:58 DEBUG client.HConnectionManager$TableServers: > > > Cache > > > >> hit > > > >> >>>> for row <> in tableName channels: location server > > > 192.168.1.16:52159 > > > >> , > > > >> >>>> location region name channels,,1270753106916 > > > >> >>>> 10/04/09 17:13:58 DEBUG mapreduce.TableInputFormatBase: > > getSplits: > > > >> split > > > >> >>>> -> 0 -> 192.168.1.16:, > > > >> >>>> 10/04/09 17:13:58 INFO mapred.JobClient: Running job: > > > >> >>>> job_201004091642_0009 > > > >> >>>> 10/04/09 17:13:59 INFO mapred.JobClient: map 0% reduce 0% > > > >> >>>> 10/04/09 17:14:09 INFO mapred.JobClient: Task Id : > > > >> >>>> attempt_201004091642_0009_m_000000_0, Status : FAILED > > > >> >>>> java.lang.NullPointerException > > > >> >>>> at > > > >> >>>> > > > >> > > > > > > org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.restart(TableInputFormatBase.java:110) > > > >> >>>> at > > > >> >>>> > > > >> > > > > > > org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.init(TableInputFormatBase.java:119) > > > >> >>>> at > > > >> >>>> > > > >> > > > > > > org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.createRecordReader(TableInputFormatBase.java:262) > > > >> >>>> at > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:588) > > > >> >>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > > > >> >>>> at org.apache.hadoop.mapred.Child.main(Child.java:170) > > > >> >>>> > > > >> >>>> 10/04/09 17:14:15 INFO mapred.JobClient: Task Id : > > > >> >>>> attempt_201004091642_0009_m_000000_1, Status : FAILED > > > >> >>>> java.lang.NullPointerException > > > >> >>>> at > > > >> >>>> > > > >> > > > > > > org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.restart(TableInputFormatBase.java:110) > > > >> >>>> at > > > >> >>>> > > > >> > > > > > > org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.init(TableInputFormatBase.java:119) > > > >> >>>> at > > > >> >>>> > > > >> > > > > > > org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.createRecordReader(TableInputFormatBase.java:262) > > > >> >>>> at > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:588) > > > >> >>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > > > >> >>>> at org.apache.hadoop.mapred.Child.main(Child.java:170) > > > >> >>>> > > > >> >>>> 10/04/09 17:14:21 INFO mapred.JobClient: Task Id : > > > >> >>>> attempt_201004091642_0009_m_000000_2, Status : FAILED > > > >> >>>> java.lang.NullPointerException > > > >> >>>> at > > > >> >>>> > > > >> > > > > > > org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.restart(TableInputFormatBase.java:110) > > > >> >>>> at > > > >> >>>> > > > >> > > > > > > org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.init(TableInputFormatBase.java:119) > > > >> >>>> at > > > >> >>>> > > > >> > > > > > > org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.createRecordReader(TableInputFormatBase.java:262) > > > >> >>>> at > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:588) > > > >> >>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > > > >> >>>> at org.apache.hadoop.mapred.Child.main(Child.java:170) > > > >> >>>> > > > >> >>>> 10/04/09 17:14:30 INFO mapred.JobClient: Job complete: > > > >> >>>> job_201004091642_0009 > > > >> >>>> 10/04/09 17:14:30 INFO mapred.JobClient: Counters: 3 > > > >> >>>> 10/04/09 17:14:30 INFO mapred.JobClient: Job Counters > > > >> >>>> 10/04/09 17:14:30 INFO mapred.JobClient: Launched map > tasks=4 > > > >> >>>> 10/04/09 17:14:30 INFO mapred.JobClient: Data-local map > > tasks=4 > > > >> >>>> 10/04/09 17:14:30 INFO mapred.JobClient: Failed map tasks=1 > > > >> >>>> 10/04/09 17:14:30 DEBUG zookeeper.ZooKeeperWrapper: Closed > > > connection > > > >> >>>> with ZooKeeper > > > >> >>>> > > > >> >>>> > > > >> >>>> > > > >> >>>> > > > >> >>> > > > >> >> > > > >> > > > > >> > > > > > > > > > > > > > > > > > > I know that adding the hbase jars to the hadoop classpath is one of the > suggested methods. Personally I like the one big jar approach. Rational: > system administration. Say you are using Hadoop X.Y.Z and you are adding > this post install work, copying libraries, edit files. etc. Now when you > update HBase you have to do that work again, or you update hadoop and you > have to do that work again. You are doubling your administrative workload > every upgrade to either hive or hbase. > > On the other side of the coin, eclipse has a FAT JAR plugin that builds one > big jar. Big jar means a little longer to start the job but that is > negligible. >
