Merge pull request #137 from tgravescs/sparkYarnJarsHdfsRebase Allow spark on yarn to be run from HDFS.
Allows the spark.jar, app.jar, and log4j.properties to be put into hdfs. Allows you to specify the files on a different hdfs cluster and it will copy them over. It makes sure permissions are correct and makes sure to put things into public distributed cache so they can be reused amongst users if their permissions are appropriate. Also add a bit of error handling for missing arguments. Project: http://git-wip-us.apache.org/repos/asf/incubator-spark/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-spark/commit/f49ea28d Tree: http://git-wip-us.apache.org/repos/asf/incubator-spark/tree/f49ea28d Diff: http://git-wip-us.apache.org/repos/asf/incubator-spark/diff/f49ea28d Branch: refs/heads/master Commit: f49ea28d25728e19e56b140a2f374631c94153bc Parents: 87f2f4e 17bb9a2 Author: Matei Zaharia <[email protected]> Authored: Tue Nov 12 19:13:39 2013 -0800 Committer: Matei Zaharia <[email protected]> Committed: Tue Nov 12 19:13:39 2013 -0800 ---------------------------------------------------------------------- docs/running-on-yarn.md | 1 + pom.xml | 6 + project/SparkBuild.scala | 3 +- yarn/pom.xml | 50 ++++ .../spark/deploy/yarn/ApplicationMaster.scala | 2 +- .../org/apache/spark/deploy/yarn/Client.scala | 276 ++++++++++--------- .../yarn/ClientDistributedCacheManager.scala | 228 +++++++++++++++ .../spark/deploy/yarn/WorkerRunnable.scala | 42 +-- .../ClientDistributedCacheManagerSuite.scala | 220 +++++++++++++++ 9 files changed, 655 insertions(+), 173 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/f49ea28d/project/SparkBuild.scala ----------------------------------------------------------------------
