[
https://issues.apache.org/jira/browse/SAMZA-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13921345#comment-13921345
]
Chris Riccomini commented on SAMZA-171:
---------------------------------------
Yeah, this is undocumented, and should be fixed. I'm going to paste in an email
that explains how to use HTTP with YARN/Samza:
{noformat}
The next step you need to take is to build your Samza job package (the
.tgz file that contains bin and lib directories). Take a look at
hello-samza, which shows how to build a .tar.gz file with the appropriate
files in it.
Once you have the .tar.gz file built, you need to publish it somewhere.
This can be HDFS or an HTTP server.
== IF YOU USE HDFS, SKIP THIS STEP ==
At LinkedIn, we use an HTTP server. The easiest way to hack this up for
testing is to start a local HTTP server on your developer box with Python:
python -m SimpleHTTPServer
This command will start a simple HTTP server serving files from the
current working directory. So, running that command from the directory
with your .tar.gz job package should work.
You then need to setup your NMs to be able to read HTTP files, since
Hadoop doesn't support an HTTP-based file system implementation out of the
box. Fortunately, Samza ships with one. To use it, you need to do two
things:
First, add this to your NM's core-site.xml:
<configuration>
<property>
<name>fs.http.impl</name>
<value>org.apache.samza.util.hadoop.HttpFileSystem</value>
</property>
</configuration>
Second, make sure that you put the following jars into your NM's class
path:
* grizzled-slf4j
* samza-yarn
* scala-compiler
* scala-library
Make sure that all of these libraries match the same version of Scala that
samza-yarn was built with.
The easiest way to add everything to your NM's class path is to put the
files in the lib directory:
hadoop-2.2.0/share/hadoop/hdfs/lib
== END OF "IF YOU USE HDFS, SKIP THIS STEP" SECTION ==
Now, you should have a .tar.gz file with a URI that's either:
hdfs://foo/bar/your-job-package.tar.gz
Or:
http://192.168.0.1/your-job-package.tar.gz
This path (either the HDFS or HTTP one, depending on which you chose to
use) is what you should set your yarn.package.path configuration parameter
to in your job's configuration file.
yarn.package.path=http://192.168.0.1/your-job-package.tar.gz
This tells YARN's NMs where to download your job package from when YARN
begins running it in the grid.
Finally, you'll want to start your job!
1. Make sure that you're using the YarnJobRunner for your
job.factory.class configuration setting (see hello-samza for an example).
2. Get a copy of one of your NM's yarn-site.xml and put it somewhere on
your desktop (I usually use ~/.yarn/conf/yarn-site.xml). Note that there's
a "conf" directory there. This is mandatory.
3. Setup an environment variable called YARN_HOME that points to the
directory that has "conf" directory in it:
export YARN_HOME=~/.yarn
4. Execute your job with run-job.sh (see
http://samza.incubator.apache.org/startup/hello-samza/0.7.0/ for an
example).
This should start the job on your YARN grid.
{noformat}
> Http Yarn package path failing
> ------------------------------
>
> Key: SAMZA-171
> URL: https://issues.apache.org/jira/browse/SAMZA-171
> Project: Samza
> Issue Type: Bug
> Reporter: Ethan Setnik
>
> When specifying an http package path for yarn jobs the jobs fail with the
> error:
> 14/03/05 16:28:40 WARN security.UserGroupInformation:
> PriviledgedActionException as:samza (auth:SIMPLE) cause:java.io.IOException:
> No FileSystem for scheme: http
> 14/03/05 16:28:40 INFO localizer.ResourceLocalizationService: DEBUG: FAILED {
> http://s3.amazonaws.com/samza_packages/wikipedia-job-package.tar.gz, 0,
> ARCHIVE, null }, No FileSystem for scheme: http
> 14/03/05 16:28:40 INFO localizer.LocalizedResource: Resource
> http://s3.amazonaws.com/samza_packages/wikipedia-job-package.tar.gz
> transitioned from DOWNLOADING to FAILED
> 14/03/05 16:28:40 INFO container.Container: Container
> container_1394035672475_0003_02_000001 transitioned from LOCALIZING to
> LOCALIZATION_FAILED
> 14/03/05 16:28:40 INFO localizer.LocalResourcesTrackerImpl: Container
> container_1394035672475_0003_02_000001 sent RELEASE event on a resource
> request {
> http://s3.amazonaws.com/samza_packages/wikipedia-job-package.tar.gz, 0,
> ARCHIVE, null } not present in cache.
> yarn.package.path=http://s3.amazonaws.com/samza_packages/wikipedia-job-package.tar.gz
> It looks like some work has already been done to support this feature by
> configuring the "fs.http.imp". I also noticed that this configuration was
> updated in SAMZA-63.
> hConfig.set("fs.http.impl", classOf[HttpFileSystem].getName)
> However my understanding is that the job package itself contains the
> necessary HttpFileSystem class to load http packages, and YARN does not
> support this configuration out of the box so i'm at a loss as to how to load
> a remote package over http.
--
This message was sent by Atlassian JIRA
(v6.2#6252)