I think the concept of task in spark should be on the same level of task in
MR. Usually in MR, we need to specify the memory the each mapper/reducer
task. And I believe executor is not a user-facing concept, it's a spark
internal concept. For spark users they don't need to know the concept of
executor, but need to know the concept of task.

On Tue, May 26, 2015 at 5:09 PM, Evo Eftimov <evo.efti...@isecc.com> wrote:

> This is the first time I hear that “one can specify the RAM per task” –
> the RAM is granted per Executor (JVM). On the other hand each Task operates
> on ONE RDD Partition – so you can say that this is “the RAM allocated to
> the Task to process” – but it is still within the boundaries allocated to
> the Executor (JVM) within which the Task is running. Also while running,
> any Task like any JVM Thread can request as much additional RAM e.g. for
> new Object instances  as there is available in the Executor aka JVM Heap
>
>
>
> *From:* canan chen [mailto:ccn...@gmail.com]
> *Sent:* Tuesday, May 26, 2015 9:30 AM
> *To:* Evo Eftimov
> *Cc:* user@spark.apache.org
> *Subject:* Re: How does spark manage the memory of executor with multiple
> tasks
>
>
>
> Yes, I know that one task represent a JVM thread. This is what I confused.
> Usually users want to specify the memory on task level, so how can I do it
> if task if thread level and multiple tasks runs in the same executor. And
> even I don't know how many threads there will be. Besides that, if one task
> cause OOM, it would cause other tasks in the same executor fail too.
> There's no isolation between tasks.
>
>
>
> On Tue, May 26, 2015 at 4:15 PM, Evo Eftimov <evo.efti...@isecc.com>
> wrote:
>
> An Executor is a JVM instance spawned and running on a Cluster Node
> (Server machine). Task is essentially a JVM Thread – you can have as many
> Threads as you want per JVM. You will also hear about “Executor Slots” –
> these are essentially the CPU Cores available on the machine and granted
> for use to the Executor
>
>
>
> Ps: what creates ongoing confusion here is that the Spark folks have
> “invented” their own terms to describe the design of their what is
> essentially a Distributed OO Framework facilitating Parallel Programming
> and Data Management in a Distributed Environment, BUT have not provided
> clear dictionary/explanations linking these “inventions” with standard
> concepts familiar to every Java, Scala etc developer
>
>
>
> *From:* canan chen [mailto:ccn...@gmail.com]
> *Sent:* Tuesday, May 26, 2015 9:02 AM
> *To:* user@spark.apache.org
> *Subject:* How does spark manage the memory of executor with multiple
> tasks
>
>
>
> Since spark can run multiple tasks in one executor, so I am curious to
> know how does spark manage memory across these tasks. Say if one executor
> takes 1GB memory, then if this executor can run 10 tasks simultaneously,
> then each task can consume 100MB on average. Do I understand it correctly ?
> It doesn't make sense to me that spark run multiple tasks in one executor.
>
>
>

Reply via email to