Re: How do you run Spark jobs?

Evan Chan Fri, 09 Aug 2013 01:01:54 -0700

Hey Patrick,

A while back I posted an SBT recipe allowing users to build Scala job
assemblies that excluded Spark and its deps, which is what most people want
I believe.  This allows you to include your own libraries and exclude
Spark's for the smallest possible one.

We don't use Spark's run script, instead we have SBT configured so that you
can simply type "run" to run jobs.   I believe this gives maximum developer
velocity.   We also have "sbt console" hooked up so that you can run spark
shell from it (no need for ./spark-shell script).

And, as you know, we are going to contribute back a job server.   We
believe that for most organizations this will provide the easiest way for
submitting and managing jobs -- IT/OPS sets up Spark as HTTP service (using
job server), and users/developers can submit jobs to a managed service.
We even have a giter8 template to make creating jobs for job server super
simple.  The template has support for local run, spark shell, assembly, and
testing.

So anyways, I believe we'll have a lot to contribute to your guide -- both
now and especially once the job server is contributed....  feel free to
touch base offline.

-Evan

On Fri, Aug 2, 2013 at 9:50 PM, Patrick Wendell <[email protected]> wrote:

> Hey All,
>
> I'm working on SPARK-800 [1]. The goal is to document a best practice or
> recommended way of bundling and running Spark jobs. We have a quickstart
> guide for writing a standlone job, but it doesn't cover how to deal with
> packaging up your dependencies and setting the correct environment
> variables required to submit a full job to a cluster. This can be a
> confusing process for beginners - it would be good to extend the guide to
> cover this.
>
> First though I wanted to sample this list and see how people tend to run
> Spark jobs inside their org's. Knowing any of the following would be
> helpful:
>
> - Do you create an uber jar with all of your job (and Spark)'s recursive
> dependencies?
> - Do you try to use sbt run or maven exec with some way to pass the correct
> environment variables?
> - Do people use a modified version of spark's own `run` script?
> - Do you have some other way of submitting jobs?
>
> Any notes would be helpful in compiling this!
>
> https://spark-project.atlassian.net/browse/SPARK-800
>

-- 
--
Evan Chan
Staff Engineer
[email protected]  |

<http://www.ooyala.com/>
<http://www.facebook.com/ooyala><http://www.linkedin.com/company/ooyala><http://www.twitter.com/ooyala>

Re: How do you run Spark jobs?

Reply via email to