How do you run Spark jobs?

Patrick Wendell Fri, 02 Aug 2013 21:51:25 -0700

Hey All,

I'm working on SPARK-800 [1]. The goal is to document a best practice or
recommended way of bundling and running Spark jobs. We have a quickstart
guide for writing a standlone job, but it doesn't cover how to deal with
packaging up your dependencies and setting the correct environment
variables required to submit a full job to a cluster. This can be a
confusing process for beginners - it would be good to extend the guide to
cover this.


First though I wanted to sample this list and see how people tend to run
Spark jobs inside their org's. Knowing any of the following would be
helpful:

- Do you create an uber jar with all of your job (and Spark)'s recursive
dependencies?
- Do you try to use sbt run or maven exec with some way to pass the correct
environment variables?
- Do people use a modified version of spark's own `run` script?
- Do you have some other way of submitting jobs?

Any notes would be helpful in compiling this!

https://spark-project.atlassian.net/browse/SPARK-800

How do you run Spark jobs?

Reply via email to