the link you sent says multiple executors per node

Worker is just demon process launching Executors / JVMs so it can execute tasks 
- it does that by cooperating with the master and the driver 

There is a one to one maping between Executor and JVM 


Sent from Samsung Mobile

<div>-------- Original message --------</div><div>From: Arush Kharbanda 
<ar...@sigmoidanalytics.com> </div><div>Date:2015/05/26  10:55  (GMT+00:00) 
</div><div>To: canan chen <ccn...@gmail.com> </div><div>Cc: Evo Eftimov 
<evo.efti...@isecc.com>,user@spark.apache.org </div><div>Subject: Re: How does 
spark manage the memory of executor with multiple tasks </div><div>
</div>Hi Evo,

Worker is the JVM and an executor runs on the JVM. And after Spark 1.4 you 
would be able to run multiple executors on the same JVM/worker.

https://issues.apache.org/jira/browse/SPARK-1706.

Thanks
Arush

On Tue, May 26, 2015 at 2:54 PM, canan chen <ccn...@gmail.com> wrote:
I think the concept of task in spark should be on the same level of task in MR. 
Usually in MR, we need to specify the memory the each mapper/reducer task. And 
I believe executor is not a user-facing concept, it's a spark internal concept. 
For spark users they don't need to know the concept of executor, but need to 
know the concept of task. 

On Tue, May 26, 2015 at 5:09 PM, Evo Eftimov <evo.efti...@isecc.com> wrote:
This is the first time I hear that “one can specify the RAM per task” – the RAM 
is granted per Executor (JVM). On the other hand each Task operates on ONE RDD 
Partition – so you can say that this is “the RAM allocated to the Task to 
process” – but it is still within the boundaries allocated to the Executor 
(JVM) within which the Task is running. Also while running, any Task like any 
JVM Thread can request as much additional RAM e.g. for new Object instances  as 
there is available in the Executor aka JVM Heap  

 

From: canan chen [mailto:ccn...@gmail.com] 
Sent: Tuesday, May 26, 2015 9:30 AM
To: Evo Eftimov
Cc: user@spark.apache.org
Subject: Re: How does spark manage the memory of executor with multiple tasks

 

Yes, I know that one task represent a JVM thread. This is what I confused. 
Usually users want to specify the memory on task level, so how can I do it if 
task if thread level and multiple tasks runs in the same executor. And even I 
don't know how many threads there will be. Besides that, if one task cause OOM, 
it would cause other tasks in the same executor fail too. There's no isolation 
between tasks.  

 

On Tue, May 26, 2015 at 4:15 PM, Evo Eftimov <evo.efti...@isecc.com> wrote:

An Executor is a JVM instance spawned and running on a Cluster Node (Server 
machine). Task is essentially a JVM Thread – you can have as many Threads as 
you want per JVM. You will also hear about “Executor Slots” – these are 
essentially the CPU Cores available on the machine and granted for use to the 
Executor

 

Ps: what creates ongoing confusion here is that the Spark folks have “invented” 
their own terms to describe the design of their what is essentially a 
Distributed OO Framework facilitating Parallel Programming and Data Management 
in a Distributed Environment, BUT have not provided clear 
dictionary/explanations linking these “inventions” with standard concepts 
familiar to every Java, Scala etc developer  

 

From: canan chen [mailto:ccn...@gmail.com] 
Sent: Tuesday, May 26, 2015 9:02 AM
To: user@spark.apache.org
Subject: How does spark manage the memory of executor with multiple tasks

 

Since spark can run multiple tasks in one executor, so I am curious to know how 
does spark manage memory across these tasks. Say if one executor takes 1GB 
memory, then if this executor can run 10 tasks simultaneously, then each task 
can consume 100MB on average. Do I understand it correctly ? It doesn't make 
sense to me that spark run multiple tasks in one executor. 

 





-- 


Arush Kharbanda || Technical Teamlead

ar...@sigmoidanalytics.com || www.sigmoidanalytics.com

Reply via email to