Hey All, I'm working on SPARK-800 [1]. The goal is to document a best practice or recommended way of bundling and running Spark jobs. We have a quickstart guide for writing a standlone job, but it doesn't cover how to deal with packaging up your dependencies and setting the correct environment variables required to submit a full job to a cluster. This can be a confusing process for beginners - it would be good to extend the guide to cover this.
First though I wanted to sample this list and see how people tend to run Spark jobs inside their org's. Knowing any of the following would be helpful: - Do you create an uber jar with all of your job (and Spark)'s recursive dependencies? - Do you try to use sbt run or maven exec with some way to pass the correct environment variables? - Do people use a modified version of spark's own `run` script? - Do you have some other way of submitting jobs? Any notes would be helpful in compiling this! https://spark-project.atlassian.net/browse/SPARK-800
