Same as you, there are lots of people coming from MapReduce world, and try to
understand the internals of Spark. Hope below can help you some way.
For the end users, they only have concept of Job. I want to run a word count
job from this one big file, that is the job I want to run. How many
Thanks Yong, this is very helpful. And found ShuffleMemoryManager which is
used to allocate memory across tasks in one executor.
These 2 tasks have to share the 2G heap memory. I don't think specifying
the memory per task is a good idea, as task is running in the Thread level,
and Memory only
Does anyone can answer my question ? I am curious to know if there's
multiple reducer tasks in one executor, how to allocate memory between
these reducers tasks since each shuffle will consume a lot of memory ?
On Tue, May 26, 2015 at 7:27 PM, Evo Eftimov evo.efti...@isecc.com wrote:
the link
An Executor is a JVM instance spawned and running on a Cluster Node (Server
machine). Task is essentially a JVM Thread – you can have as many Threads as
you want per JVM. You will also hear about “Executor Slots” – these are
essentially the CPU Cores available on the machine and granted for use
This is the first time I hear that “one can specify the RAM per task” – the RAM
is granted per Executor (JVM). On the other hand each Task operates on ONE RDD
Partition – so you can say that this is “the RAM allocated to the Task to
process” – but it is still within the boundaries allocated to
Yes, I know that one task represent a JVM thread. This is what I confused.
Usually users want to specify the memory on task level, so how can I do it
if task if thread level and multiple tasks runs in the same executor. And
even I don't know how many threads there will be. Besides that, if one
I think the concept of task in spark should be on the same level of task in
MR. Usually in MR, we need to specify the memory the each mapper/reducer
task. And I believe executor is not a user-facing concept, it's a spark
internal concept. For spark users they don't need to know the concept of
Hi Evo,
Worker is the JVM and an executor runs on the JVM. And after Spark 1.4 you
would be able to run multiple executors on the same JVM/worker.
https://issues.apache.org/jira/browse/SPARK-1706.
Thanks
Arush
On Tue, May 26, 2015 at 2:54 PM, canan chen ccn...@gmail.com wrote:
I think the
the link you sent says multiple executors per node
Worker is just demon process launching Executors / JVMs so it can execute tasks
- it does that by cooperating with the master and the driver
There is a one to one maping between Executor and JVM
Sent from Samsung Mobile
div