Re: Running Spark in local mode seems to ignore local[N]

Dmitry Goldenberg Mon, 11 May 2015 14:30:24 -0700

Understood. We'll use the multi-threaded code we already have..

How are these execution slots filled up? I assume each slot is dedicated to
one submitted task.  If that's the case, how is each task distributed then,
i.e. how is that task run in a multi-node fashion?  Say 1000 batches/RDD's
are extracted out of Kafka, how does that relate to the number of executors
vs. task slots?


Presumably we can fill up the slots with multiple instances of the same
task... How do we know how many to launch?

On Mon, May 11, 2015 at 5:20 PM, Sean Owen <so...@cloudera.com> wrote:

> BTW I think my comment was wrong as marcelo demonstrated. In
> standalone mode you'd have one worker, and you do have one executor,
> but his explanation is right. But, you certainly have execution slots
> for each core.
>
> Are you talking about your own user code? you can make threads, but
> that's nothing do with Spark then. If you run code on your driver,
> it's not distributed. If you run Spark over an RDD with 1 partition,
> only one task works on it.
>
> On Mon, May 11, 2015 at 10:16 PM, Dmitry Goldenberg
> <dgoldenberg...@gmail.com> wrote:
> > Sean,
> >
> > How does this model actually work? Let's say we want to run one job as N
> > threads executing one particular task, e.g. streaming data out of Kafka
> into
> > a search engine.  How do we configure our Spark job execution?
> >
> > Right now, I'm seeing this job running as a single thread. And it's
> quite a
> > bit slower than just running a simple utility with a thread executor
> with a
> > thread pool of N threads doing the same task.
> >
> > The performance I'm seeing of running the Kafka-Spark Streaming job is 7
> > times slower than that of the utility.  What's pulling Spark back?
> >
> > Thanks.
> >
> >
> > On Mon, May 11, 2015 at 4:55 PM, Sean Owen <so...@cloudera.com> wrote:
> >>
> >> You have one worker with one executor with 32 execution slots.
> >>
> >> On Mon, May 11, 2015 at 9:52 PM, dgoldenberg <dgoldenberg...@gmail.com>
> >> wrote:
> >> > Hi,
> >> >
> >> > Is there anything special one must do, running locally and submitting
> a
> >> > job
> >> > like so:
> >> >
> >> > spark-submit \
> >> >         --class "com.myco.Driver" \
> >> >         --master local[*]  \
> >> >         ./lib/myco.jar
> >> >
> >> > In my logs, I'm only seeing log messages with the thread identifier of
> >> > "Executor task launch worker-0".
> >> >
> >> > There are 4 cores on the machine so I expected 4 threads to be at
> play.
> >> > Running with local[32] did not yield 32 worker threads.
> >> >
> >> > Any recommendations? Thanks.
> >> >
> >> >
> >> >
> >> > --
> >> > View this message in context:
> >> >
> http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-in-local-mode-seems-to-ignore-local-N-tp22851.html
> >> > Sent from the Apache Spark User List mailing list archive at
> Nabble.com.
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >> > For additional commands, e-mail: user-h...@spark.apache.org
> >> >
> >
> >
>

Re: Running Spark in local mode seems to ignore local[N]

Reply via email to