RE: How does spark manage the memory of executor with multiple tasks

2015-05-27 Thread java8964
Same as you, there are lots of people coming from MapReduce world, and try to understand the internals of Spark. Hope below can help you some way. For the end users, they only have concept of Job. I want to run a word count job from this one big file, that is the job I want to run. How many

Re: How does spark manage the memory of executor with multiple tasks

2015-05-27 Thread canan chen
Thanks Yong, this is very helpful. And found ShuffleMemoryManager which is used to allocate memory across tasks in one executor. These 2 tasks have to share the 2G heap memory. I don't think specifying the memory per task is a good idea, as task is running in the Thread level, and Memory only

Re: How does spark manage the memory of executor with multiple tasks

2015-05-27 Thread canan chen
Does anyone can answer my question ? I am curious to know if there's multiple reducer tasks in one executor, how to allocate memory between these reducers tasks since each shuffle will consume a lot of memory ? On Tue, May 26, 2015 at 7:27 PM, Evo Eftimov evo.efti...@isecc.com wrote: the link

RE: How does spark manage the memory of executor with multiple tasks

2015-05-26 Thread Evo Eftimov
An Executor is a JVM instance spawned and running on a Cluster Node (Server machine). Task is essentially a JVM Thread – you can have as many Threads as you want per JVM. You will also hear about “Executor Slots” – these are essentially the CPU Cores available on the machine and granted for use

RE: How does spark manage the memory of executor with multiple tasks

2015-05-26 Thread Evo Eftimov
This is the first time I hear that “one can specify the RAM per task” – the RAM is granted per Executor (JVM). On the other hand each Task operates on ONE RDD Partition – so you can say that this is “the RAM allocated to the Task to process” – but it is still within the boundaries allocated to

Re: How does spark manage the memory of executor with multiple tasks

2015-05-26 Thread canan chen
Yes, I know that one task represent a JVM thread. This is what I confused. Usually users want to specify the memory on task level, so how can I do it if task if thread level and multiple tasks runs in the same executor. And even I don't know how many threads there will be. Besides that, if one

Re: How does spark manage the memory of executor with multiple tasks

2015-05-26 Thread canan chen
I think the concept of task in spark should be on the same level of task in MR. Usually in MR, we need to specify the memory the each mapper/reducer task. And I believe executor is not a user-facing concept, it's a spark internal concept. For spark users they don't need to know the concept of

Re: How does spark manage the memory of executor with multiple tasks

2015-05-26 Thread Arush Kharbanda
Hi Evo, Worker is the JVM and an executor runs on the JVM. And after Spark 1.4 you would be able to run multiple executors on the same JVM/worker. https://issues.apache.org/jira/browse/SPARK-1706. Thanks Arush On Tue, May 26, 2015 at 2:54 PM, canan chen ccn...@gmail.com wrote: I think the

Re: How does spark manage the memory of executor with multiple tasks

2015-05-26 Thread Evo Eftimov
 the link you sent says multiple executors per node Worker is just demon process launching Executors / JVMs so it can execute tasks - it does that by cooperating with the master and the driver  There is a one to one maping between Executor and JVM  Sent from Samsung Mobile div