Re: Deploying Samza Jobs Using S3 and YARN on AWS

Jagadish Venkatraman Sat, 23 Sep 2017 20:58:06 -0700

Glad that you found the issue Xiaochuan!

Should you decide to use the HttpFileSystem, Please set the config for
yarn.package.path to point to the HTTP URI of your job's binary.
Do let us know should you hit snags down that path!


Best,

On Sat, Sep 23, 2017 at 5:24 PM XiaoChuan Yu <xiaochuan...@kik.com> wrote:

> I found out that it was necessary to include "hadoop-aws" as a part of the
> package submitted to YARN similar to the instructions for deploying from
> HDFS
> <
> https://samza.apache.org/learn/tutorials/0.7.0/deploy-samza-job-from-hdfs.html
> >
> .
> However, due to a dependency conflict on the AWS SDK between our code and
> "hadoop-aws", we can't actually include it.
> We are now planning to make use of HTTP FS instead.
>
> On Fri, Sep 15, 2017 at 2:45 PM Jagadish Venkatraman <
> jagadish1...@gmail.com>
> wrote:
>
> > Thank you Xiaochuan for your question!
> >
> > You should ensure that *every machine in your cluster* has the S3 jar
> file
> > in its YARN class-path. From your error, it looks like the machine you
> are
> > running on does not have the JAR file corresponding to *S3AFileSystem*.
> >
> > >> Whats the right way to set this up? Should I just copy over the
> required
> > AWS jars to the Hadoop conf directory
> >
> > I'd lean on the side of simplicity and the *scp* route seems to address
> > most of your needs.
> >
> > >> Should I be editing run-job.sh or run-class.sh?
> >
> > You should not have to edit any of these files. Once you fix your
> > class-paths by copying those relevant JARs, it should just work.
> >
> > Please let us know if you need more assistance.
> >
> > --
> > Jagdish
> >
> >
> > On Fri, Sep 15, 2017 at 11:07 AM, XiaoChuan Yu <xiaochuan...@kik.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I'm trying to deploy a Samza job using YARN and S3 where I upload the
> zip
> > > package to S3 and point yarn.package.path to it.
> > > Does anyone know what kind of set up steps is required for this?
> > >
> > > What I've tried so far is to get Hello Samza to be run this way in AWS.
> > >
> > > However I ran into the following exception:
> > > Exception in thread "main" java.lang.RuntimeException:
> > > java.lang.ClassNotFoundException: Class
> > > org.apache.hadoop.fs.s3a.S3AFileSystem not found
> > > at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2112)
> > > at org.apache.hadoop.fs.FileSystem.getFileSystemClass(
> > > FileSystem.java:2578)
> > > at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
> > > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
> > > ...
> > >
> > > Running "$YARN_HOME/bin/yarn classpath" gives the following:
> > > /home/ec2-user/deploy/yarn/etc/hadoop
> > > /home/ec2-user/deploy/yarn/etc/hadoop
> > > /home/ec2-user/deploy/yarn/etc/hadoop
> > > /home/ec2-user/deploy/yarn/share/hadoop/common/lib/*
> > > /home/ec2-user/deploy/yarn/share/hadoop/common/*
> > > /home/ec2-user/deploy/yarn/share/hadoop/hdfs
> > > /home/ec2-user/deploy/yarn/share/hadoop/hdfs/lib/*
> > > /home/ec2-user/deploy/yarn/share/hadoop/hdfs/*
> > > /home/ec2-user/deploy/yarn/share/hadoop/yarn/lib/*
> > > /home/ec2-user/deploy/yarn/share/hadoop/yarn/*
> > > /home/ec2-user/deploy/yarn/share/hadoop/mapreduce/lib/*
> > > /home/ec2-user/deploy/yarn/share/hadoop/mapreduce/*
> > > /contrib/capacity-scheduler/*.jar
> > > /home/ec2-user/deploy/yarn/share/hadoop/yarn/*
> > > /home/ec2-user/deploy/yarn/share/hadoop/yarn/lib/*
> > >
> > > I manually copied the required AWS related jars to
> > > /home/ec2-user/deploy/yarn/share/hadoop/common.
> > > I checked that it is loadable by running "yarn
> > > org.apache.hadoop.fs.s3a.S3AFileSystem" which gives the "Main method
> not
> > > found" error instead of class not found.
> > >
> > > From the console output of run-job.sh I see the following in class
> path:
> > > 1. All jars under the lib directory of the zip package
> > > 2. /home/ec2-user/deploy/yarn/etc/hadoop (Hadoop conf directory)
> > >
> > > The class path from run-job.sh seem to be missing the AWS related jars
> > > required for S3AFileSystem.
> > > Whats the right way to set this up?
> > > Should I just copy over the required AWS jars to the Hadoop conf
> > directory
> > > (2.)?
> > > Should I be editing run-job.sh or run-class.sh?
> > >
> > > Thanks,
> > > Xiaochuan Yu
> > >
> >
> >
> >
> > --
> > Jagadish V,
> > Graduate Student,
> > Department of Computer Science,
> > Stanford University
> >
>
-- 
Sent from my iphone.

Re: Deploying Samza Jobs Using S3 and YARN on AWS

Reply via email to