Actually, the HBase documentation discourages physically copying JARs from
the HBase classpath to the Hadoop one:

>From the HBase API documentation (
http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/mapreduce/package-summary.html
):

"HBase, MapReduce and the CLASSPATH

MapReduce jobs deployed to a MapReduce cluster do not by default have access
to the HBase configuration under $HBASE_CONF_DIR nor to HBase classes. You
could add hbase-site.xml to $HADOOP_HOME/conf and add hbase jars to the
$HADOOP_HOME/lib and copy these changes across your cluster *but a cleaner
means of adding hbase configuration and classes to the cluster CLASSPATH is
by uncommenting HADOOP_CLASSPATH in $HADOOP_HOME/conf/hadoop-env.sh adding
hbase dependencies here.*"

It seems that the approach in bold is not sufficient and that not all mapred
jobs have access to the required JARs unless the first approach is taken.

-GS

On Sat, Apr 10, 2010 at 1:35 PM, Edward Capriolo <[email protected]>wrote:

> On Sat, Apr 10, 2010 at 1:31 PM, George Stathis <[email protected]>
> wrote:
>
> > Ted,
> >
> > HADOOP-6695 is an improvement request and a different issue from what I
> am
> > encountering. What I am referring to is not a dynamic classloading issue.
> > It
> > happens even after the servers are being restarted. You are requesting
> for
> > Hadoop to automatically detect new JARs without restarting when they are
> > placed in its' classpath. I'm saying that my MapRed jobs fail unless some
> > JARs are physically present in the hadoop lib directory, regardless of
> > server restarts and HADOOP_CLASSPATH settings.
> >
> > I hope this clarifies things.
> >
> > -GS
> >
> > On Sat, Apr 10, 2010 at 1:11 PM, <[email protected]> wrote:
> >
> > > I logged HADOOP-6695
> > >
> > > Cheers
> > > Sent from my Verizon Wireless BlackBerry
> > >
> > > -----Original Message-----
> > > From: George Stathis <[email protected]>
> > > Date: Sat, 10 Apr 2010 12:11:37
> > > To: <[email protected]>
> > > Subject: Re: org.apache.hadoop.hbase.mapreduce.Export fails with an NPE
> > >
> > > OK, the issue remains in our Ubuntu EC2 dev environment, so it's not
> just
> > > my
> > > local setup. Here are some more observations based on some tests I just
> > > ran:
> > >
> > >   - If the zookeeper JAR is omitted from HADOOP_CLASSPATH, then there
> are
> > >   ClassNotFoundExceptions thrown as would be expected
> > >   - If the zookeeper JAR is included in HADOOP_CLASSPATH,
> > >   the ClassNotFoundExceptions go away, but then the original NPE
> > >   re-appears:
> > >
> >
> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.restart(TableInputFormatBase.java:110)
> > >   - If the zookeeper JAR in physically included in $HADOOP_HOME/lib,
> then
> > >   the NPE goes away as well
> > >
> > > So, while it seems that the HADOOP_CLASSPATH is indeed being read,
> > > something
> > > is missing during the MapRed process that keeps the htable from being
> > > instantiated properly in TableInputFormatBase unless some JARs are
> > > physically present in $HADOOP_HOME/lib. Note that this issue is not
> > > specific
> > > to the zookeeper JAR either. We have enabled the transactional contrib
> > > indexed tables and we have the same problem if we don't physically
> > > include hbase-transactional-0.20.3.jar in the hadoop lib even though
> it's
> > > included in HADOOP_CLASSPATH.
> > >
> > > It feels like there is a discrepancy in the way classloading is done
> > > between
> > > the various components. But I'm not sure whether this is even an HBase
> > > issue
> > > and not a Hadoop one. Seems like this might be a JIRA ticket candidate.
> > Any
> > > thoughts on which project should look at this first?
> > >
> > > -GS
> > >
> > > On Fri, Apr 9, 2010 at 8:29 PM, George Stathis <[email protected]>
> > wrote:
> > >
> > > > Here is mine:
> > > >
> > > > export
> > > >
> > >
> >
> HADOOP_CLASSPATH="$HBASE_HOME/hbase-0.20.3.jar:$HBASE_HOME/hbase-0.20.3-test.jar:$HBASE_HOME/lib/zookeeper-3.2.2.jar:$HBASE_HOME/conf"
> > > >
> > > > $HBASE_HOME is defined in my .bash_profile, so it's already there and
> I
> > > see
> > > > it expanded in the debug statements with the correct path. I even
> tried
> > > > hard-coding the $HBASE_HOME path above just in case and I had the
> same
> > > > issue.
> > > >
> > > > I any case, I'm passed it now. I'll have to check whether the same
> > issue
> > > > happens on our dev environment running on Ubuntu on EC2. If not, then
> > at
> > > > least it's localized to my OSX environment.
> > > >
> > > > -GS
> > > >
> > > >
> > > > On Fri, Apr 9, 2010 at 7:32 PM, Stack <[email protected]> wrote:
> > > >
> > > >> Very odd.  I don't have to do that running MR jobs.  I wonder whats
> > > >> different? (I'm using 0.20.4 near-candidate rather than 0.20.3,
> > > >> 1.6.0u14).  I have a HADOOP_ENV like this.
> > > >>
> > > >> export HBASE_HOME=/home/hadoop/0.20
> > > >> export HBASE_VERSION=20.4-dev
> > > >> #export
> > > >>
> > >
> >
> HADOOP_CLASSPATH="$HBASE_HOME/conf:$HBASE_HOME/build/hbase-0.20.4-dev.jar:$HBASE_HOME/build/hbase-0.20.4-dev-test.jar:$HBASE_HOME/lib/zookeeper-3.2.2.jar"
> > > >> export
> > > >>
> > >
> >
> HADOOP_CLASSPATH="$HBASE_HOME/conf:$HBASE_HOME/build/hbase-0.${HBASE_VERSION}.jar:$HBASE_HOME/build/hbase-0.${HBASE_VERSION}-test.jar:$HBASE_HOME/lib/zookeeper-3.2.2.jar"
> > > >>
> > > >> St.Ack
> > > >>
> > > >> On Fri, Apr 9, 2010 at 4:19 PM, George Stathis <[email protected]>
> > > >> wrote:
> > > >> > Solved: for those interested, I had to explicitly copy
> > > >> zookeeper-3.2.2.jar
> > > >> > to $HADOOP_HOME/lib even though I had added its' path to
> > > >> $HADOOP_CLASSPATH
> > > >> > under $HADOOP_HOME/conf/hadoop-env.sh.
> > > >> >
> > > >> > It makes no sense to me why that particular JAR would not get
> picked
> > > up.
> > > >> It
> > > >> > was even listed in the classpath debug output when I ran the job
> > using
> > > >> the
> > > >> > hadoop shell script. If anyone can enlighten, please do.
> > > >> >
> > > >> > -GS
> > > >> >
> > > >> > On Fri, Apr 9, 2010 at 5:56 PM, George Stathis <
> [email protected]>
> > > >> wrote:
> > > >> >
> > > >> >> No dice. Classpath is now set. Same error. Meanwhile, I'm running
> > "$
> > > >> hadoop
> > > >> >> org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1"
> > just
> > > >> fine,
> > > >> >> so MapRed is working at least.
> > > >> >>
> > > >> >> Still looking for suggestions then I guess.
> > > >> >>
> > > >> >> -GS
> > > >> >>
> > > >> >>
> > > >> >> On Fri, Apr 9, 2010 at 5:31 PM, George Stathis <
> [email protected]
> > >
> > > >> wrote:
> > > >> >>
> > > >> >>> RTFMing
> > > >> >>>
> > > >>
> > >
> >
> http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/mapreduce/package-summary.htmlright
> > > >> >>> now...Hadoop classpath not being set properly could be the
> > issue...
> > > >> >>>
> > > >> >>>
> > > >> >>> On Fri, Apr 9, 2010 at 5:26 PM, George Stathis <
> > [email protected]
> > > >> >wrote:
> > > >> >>>
> > > >> >>>> Hi folks,
> > > >> >>>>
> > > >> >>>> I hope this is just a newbie problem.
> > > >> >>>>
> > > >> >>>> Context:
> > > >> >>>> - Running 0.20.3 tag locally in pseudo cluster mode
> > > >> >>>> - $HBASE_HOME is in env and $PATH
> > > >> >>>> - Running org.apache.hadoop.hbase.mapreduce.Export in the shell
> > > such
> > > >> >>>> as: $ hbase org.apache.hadoop.hbase.mapreduce.Export channels
> > > >> >>>> /bkps/channels/01
> > > >> >>>>
> > > >> >>>> Symptom:
> > > >> >>>> - Getting an NPE at
> > > >> >>>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.restart(TableInputFormatBase.java:110):
> > > >> >>>>
> > > >> >>>> [...]
> > > >> >>>> 110      this.scanner = this.htable.getScanner(newScan);
> > > >> >>>> [...]
> > > >> >>>>
> > > >> >>>> Full output is bellow. Not sure why htable is still null at
> that
> > > >> point.
> > > >> >>>> User error?
> > > >> >>>>
> > > >> >>>> Any help is appreciated.
> > > >> >>>>
> > > >> >>>> -GS
> > > >> >>>>
> > > >> >>>> Full output:
> > > >> >>>>
> > > >> >>>> $ hbase org.apache.hadoop.hbase.mapreduce.Export channels
> > > >> >>>> /bkps/channels/01
> > > >> >>>> 2010-04-09 17:13:57.407::INFO:  Logging to STDERR via
> > > >> >>>> org.mortbay.log.StdErrLog
> > > >> >>>> 2010-04-09 17:13:57.408::INFO:  verisons=1, starttime=0,
> > > >> >>>> endtime=9223372036854775807
> > > >> >>>> 10/04/09 17:13:58 DEBUG zookeeper.ZooKeeperWrapper: Read ZNode
> > > >> >>>> /hbase/root-region-server got 192.168.1.16:52159
> > > >> >>>> 10/04/09 17:13:58 DEBUG client.HConnectionManager$TableServers:
> > > Found
> > > >> >>>> ROOT at 192.168.1.16:52159
> > > >> >>>> 10/04/09 17:13:58 DEBUG client.HConnectionManager$TableServers:
> > > >> Cached
> > > >> >>>> location for .META.,,1 is 192.168.1.16:52159
> > > >> >>>> 10/04/09 17:13:58 DEBUG client.HConnectionManager$TableServers:
> > > >> Cached
> > > >> >>>> location for channels,,1270753106916 is 192.168.1.16:52159
> > > >> >>>> 10/04/09 17:13:58 DEBUG client.HConnectionManager$TableServers:
> > > Cache
> > > >> hit
> > > >> >>>> for row <> in tableName channels: location server
> > > 192.168.1.16:52159
> > > >> ,
> > > >> >>>> location region name channels,,1270753106916
> > > >> >>>> 10/04/09 17:13:58 DEBUG mapreduce.TableInputFormatBase:
> > getSplits:
> > > >> split
> > > >> >>>> -> 0 -> 192.168.1.16:,
> > > >> >>>> 10/04/09 17:13:58 INFO mapred.JobClient: Running job:
> > > >> >>>> job_201004091642_0009
> > > >> >>>> 10/04/09 17:13:59 INFO mapred.JobClient:  map 0% reduce 0%
> > > >> >>>> 10/04/09 17:14:09 INFO mapred.JobClient: Task Id :
> > > >> >>>> attempt_201004091642_0009_m_000000_0, Status : FAILED
> > > >> >>>> java.lang.NullPointerException
> > > >> >>>>  at
> > > >> >>>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.restart(TableInputFormatBase.java:110)
> > > >> >>>> at
> > > >> >>>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.init(TableInputFormatBase.java:119)
> > > >> >>>>  at
> > > >> >>>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.createRecordReader(TableInputFormatBase.java:262)
> > > >> >>>> at
> > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:588)
> > > >> >>>>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > >> >>>> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> > > >> >>>>
> > > >> >>>> 10/04/09 17:14:15 INFO mapred.JobClient: Task Id :
> > > >> >>>> attempt_201004091642_0009_m_000000_1, Status : FAILED
> > > >> >>>> java.lang.NullPointerException
> > > >> >>>> at
> > > >> >>>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.restart(TableInputFormatBase.java:110)
> > > >> >>>>  at
> > > >> >>>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.init(TableInputFormatBase.java:119)
> > > >> >>>> at
> > > >> >>>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.createRecordReader(TableInputFormatBase.java:262)
> > > >> >>>>  at
> > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:588)
> > > >> >>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > >> >>>>  at org.apache.hadoop.mapred.Child.main(Child.java:170)
> > > >> >>>>
> > > >> >>>> 10/04/09 17:14:21 INFO mapred.JobClient: Task Id :
> > > >> >>>> attempt_201004091642_0009_m_000000_2, Status : FAILED
> > > >> >>>> java.lang.NullPointerException
> > > >> >>>> at
> > > >> >>>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.restart(TableInputFormatBase.java:110)
> > > >> >>>>  at
> > > >> >>>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$TableRecordReader.init(TableInputFormatBase.java:119)
> > > >> >>>> at
> > > >> >>>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.createRecordReader(TableInputFormatBase.java:262)
> > > >> >>>>  at
> > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:588)
> > > >> >>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > >> >>>>  at org.apache.hadoop.mapred.Child.main(Child.java:170)
> > > >> >>>>
> > > >> >>>> 10/04/09 17:14:30 INFO mapred.JobClient: Job complete:
> > > >> >>>> job_201004091642_0009
> > > >> >>>> 10/04/09 17:14:30 INFO mapred.JobClient: Counters: 3
> > > >> >>>> 10/04/09 17:14:30 INFO mapred.JobClient:   Job Counters
> > > >> >>>> 10/04/09 17:14:30 INFO mapred.JobClient:     Launched map
> tasks=4
> > > >> >>>> 10/04/09 17:14:30 INFO mapred.JobClient:     Data-local map
> > tasks=4
> > > >> >>>> 10/04/09 17:14:30 INFO mapred.JobClient:     Failed map tasks=1
> > > >> >>>> 10/04/09 17:14:30 DEBUG zookeeper.ZooKeeperWrapper: Closed
> > > connection
> > > >> >>>> with ZooKeeper
> > > >> >>>>
> > > >> >>>>
> > > >> >>>>
> > > >> >>>>
> > > >> >>>
> > > >> >>
> > > >> >
> > > >>
> > > >
> > > >
> > >
> > >
> >
>
> I know that adding the hbase jars to the hadoop classpath is one of the
> suggested methods. Personally I like the one big jar approach. Rational:
> system administration. Say you are using Hadoop X.Y.Z and you are adding
> this post install work, copying libraries, edit files. etc. Now when you
> update HBase you have to do that work again, or you update hadoop and you
> have to do that work again. You are doubling your administrative workload
> every upgrade to either hive or hbase.
>
> On the other side of the coin, eclipse has a FAT JAR plugin that builds one
> big jar. Big jar means a little longer to start the job but that is
> negligible.
>

Reply via email to