Repository: incubator-samza Updated Branches: refs/heads/master 048ffd2fe -> c932c5029
SAMZA-181; add a tutorial to show how to run samza jobs from HDFS. Project: http://git-wip-us.apache.org/repos/asf/incubator-samza/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-samza/commit/c932c502 Tree: http://git-wip-us.apache.org/repos/asf/incubator-samza/tree/c932c502 Diff: http://git-wip-us.apache.org/repos/asf/incubator-samza/diff/c932c502 Branch: refs/heads/master Commit: c932c50298bdd8cd6022286d364aa6daaaaa5667 Parents: 048ffd2 Author: Yan Fang <[email protected]> Authored: Thu Mar 13 12:48:35 2014 -0700 Committer: Chris Riccomini <[email protected]> Committed: Thu Mar 13 12:48:35 2014 -0700 ---------------------------------------------------------------------- .../0.7.0/deploy-samza-job-from-hdfs.md | 55 ++++++++++++++++++++ docs/learn/tutorials/0.7.0/index.md | 2 + 2 files changed, 57 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-samza/blob/c932c502/docs/learn/tutorials/0.7.0/deploy-samza-job-from-hdfs.md ---------------------------------------------------------------------- diff --git a/docs/learn/tutorials/0.7.0/deploy-samza-job-from-hdfs.md b/docs/learn/tutorials/0.7.0/deploy-samza-job-from-hdfs.md new file mode 100644 index 0000000..ca5f599 --- /dev/null +++ b/docs/learn/tutorials/0.7.0/deploy-samza-job-from-hdfs.md @@ -0,0 +1,55 @@ +--- +layout: page +title: Deploying a Samza job from HDFS +--- + +This tutorial uses [hello-samza](../../../startup/hello-samza/0.7.0/) to illustrate how to run a Samza job if you want to publish the Samza job's .tar.gz package to HDFS. + +### Build a new Samza job package + +Build a new Samza job package to include the hadoop-hdfs-version.jar. + +* Add dependency statement in pom.xml of samza-job-package + +``` +<dependency> + <groupId>org.apache.hadoop</groupId> + <artifactId>hadoop-hdfs</artifactId> + <version>2.2.0</version> +</dependency> +``` + +* Add the following code to src/main/assembly/src.xml in samza-job-package. + +``` +<include>org.apache.hadoop:hadoop-hdfs</include> +``` + +* Create .tar.gz package + +``` +mvn clean pacakge +``` + +* Make sure hadoop-common-version.jar has the same version as your hadoop-hdfs-version.jar. Otherwise, you may still have errors. + +### Upload the package + +``` +hadoop fs -put ./samza-job-package/target/samza-job-package-0.7.0-dist.tar.gz /path/for/tgz +``` + +### Add HDFS configuration + +Put the hdfs-site.xml file of your cluster into ~/.samza/conf directory. (The same place as the yarn-site.xml) + +### Change properties file + +Change the yarn.package.path in the properties file to your HDFS location. + +``` +yarn.package.path=hdfs://<hdfs name node ip>:<hdfs name node port>/path/to/tgz +``` + +Then you should be able to run the Samza job as described in [hello-samza](../../../startup/hello-samza/0.7.0/). + http://git-wip-us.apache.org/repos/asf/incubator-samza/blob/c932c502/docs/learn/tutorials/0.7.0/index.md ---------------------------------------------------------------------- diff --git a/docs/learn/tutorials/0.7.0/index.md b/docs/learn/tutorials/0.7.0/index.md index 42283ef..e40861e 100644 --- a/docs/learn/tutorials/0.7.0/index.md +++ b/docs/learn/tutorials/0.7.0/index.md @@ -5,6 +5,8 @@ title: Tutorials [Remote Debugging with Samza](remote-debugging-samza.html) +[Deploying a Samza Job from HDFS](deploy-samza-job-from-hdfs.html) + <!-- TODO a bunch of tutorials [Log Walkthrough](log-walkthrough.html) <a href="configuring-kafka-system.html">Configuring a Kafka System</a><br/>
