Re: MapReduce jobs remotely

Kevin Thu, 03 May 2012 07:35:59 -0700

I believe I have fixed it.

I am using pig-0.9.2. My cluster is using CDH4b2, but I am not using the
Pig RPM install on the client. I downloaded the tarball from Apache.

Each machine in my cluster has $HADOOP_MAPRED_HOME defined.

I cleaned up my Pig configuration directory to only have the following jars:

automation.jar
jython-2.5.0.jar
pig-0.9.2-cdh4.0.0b2-core.jar
pig-0.9.2-cdh4.0.0b2.jar
protobuf-java-2.4.0a.jar
slf4j-api-1.6.1.jar
slf4j-log4j12-1.6.1.jar

I installed pig on one of the cluster machines just to get and copy the pig
cdh4b2 libraries. This combination of libraries made Pig connect to the
YARN architecture. Prior to this combination I basically included every
library Hadoop and HBase offered, which was overkill and the
hadoop-mapreduce-*.jar files are what made Pig with LocalJobRunner. After
this the job kept failing and digging through HDFS for the log files, it
was revealed that MRv2 couldn't find TableOutputFormat. To get
TableOutputFormat to work with MapReduce jobs I needed to put update
yarn-site.xml to include HBase dependencies:

<property>
  <name>yarn.application.classpath</name>
  <value>
    $HADOOP_CONF_DIR,
    $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
    $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
    $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
    $YARN_HOME/*,$YARN_HOME/lib/*,
    /etc/hbase/conf,
    /usr/lib/hbase/*,/usr/lib/hbase/lib/*
  </value>
  <description>
    Classpath for typical applications.
  </description>
</property>

Maybe this will be helpful to someone else.

Thanks.

On Thu, May 3, 2012 at 1:02 AM, Harsh J <ha...@cloudera.com> wrote:

> Kevin,
>
> What version of Pig are you using?
>
> Have you tried setting the right MR home directory to point Pig to the
> local MR configuration for YARN?
>
> $ HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce $PIG_HOME/bin/pig
>
> Usually does it for me, so long as I have
> /usr/lib/hadoop-mapreduce/conf configured properly for YARN+MR (and
> considering that my YARN libs, etc. are all inside
> /usr/lib/hadoop-mapreduce).
>
> On Thu, May 3, 2012 at 12:11 AM, Kevin <kevin.macksa...@gmail.com> wrote:
> > Hi,
> >
> > I have a cluster running YARN, and mapreduce jobs run as expected when
> they
> > are executed from one of the nodes. However, when I run Pig scripts from
> a
> > remote client, Pig connects to HDFS and HBase but runs its MapReduce job
> > using the LocalJobRunner. Jobs finish successfully, but they aren't using
> > the YARN architecture. I have placed all the configuration files in the
> Pig
> > configuration directory, and this must be right otherwise Pig wouldn't
> > connect to my cluster's HDFS and HBase.
> >
> > I have even put "mapreduce.framework.name=yarn" in the pig.properties
> file.
> >
> > Any ideas to get jobs submitted to a remote Hadoop cluster to work in
> > distributed mode?
> >
> > -Kevin
>
>
>
> --
> Harsh J
>

Re: MapReduce jobs remotely

Reply via email to