Hey Patrick, A while back I posted an SBT recipe allowing users to build Scala job assemblies that excluded Spark and its deps, which is what most people want I believe. This allows you to include your own libraries and exclude Spark's for the smallest possible one.
We don't use Spark's run script, instead we have SBT configured so that you can simply type "run" to run jobs. I believe this gives maximum developer velocity. We also have "sbt console" hooked up so that you can run spark shell from it (no need for ./spark-shell script). And, as you know, we are going to contribute back a job server. We believe that for most organizations this will provide the easiest way for submitting and managing jobs -- IT/OPS sets up Spark as HTTP service (using job server), and users/developers can submit jobs to a managed service. We even have a giter8 template to make creating jobs for job server super simple. The template has support for local run, spark shell, assembly, and testing. So anyways, I believe we'll have a lot to contribute to your guide -- both now and especially once the job server is contributed.... feel free to touch base offline. -Evan On Fri, Aug 2, 2013 at 9:50 PM, Patrick Wendell <[email protected]> wrote: > Hey All, > > I'm working on SPARK-800 [1]. The goal is to document a best practice or > recommended way of bundling and running Spark jobs. We have a quickstart > guide for writing a standlone job, but it doesn't cover how to deal with > packaging up your dependencies and setting the correct environment > variables required to submit a full job to a cluster. This can be a > confusing process for beginners - it would be good to extend the guide to > cover this. > > First though I wanted to sample this list and see how people tend to run > Spark jobs inside their org's. Knowing any of the following would be > helpful: > > - Do you create an uber jar with all of your job (and Spark)'s recursive > dependencies? > - Do you try to use sbt run or maven exec with some way to pass the correct > environment variables? > - Do people use a modified version of spark's own `run` script? > - Do you have some other way of submitting jobs? > > Any notes would be helpful in compiling this! > > https://spark-project.atlassian.net/browse/SPARK-800 > -- -- Evan Chan Staff Engineer [email protected] | <http://www.ooyala.com/> <http://www.facebook.com/ooyala><http://www.linkedin.com/company/ooyala><http://www.twitter.com/ooyala>
