[ 
https://issues.apache.org/jira/browse/SPARK-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14268440#comment-14268440
 ] 

Gerard Maas edited comment on SPARK-4940 at 1/7/15 10:53 PM:
-------------------------------------------------------------

Hi Tim,

We are indeed using Coarse Grain mode. I'm not sure fine-grained mode makes 
much sense for Spark Streaming.

Here're few examples of resource allocation. They are taken from several runs 
of the same job with identical configuration:
Job config:
spark.cores.max = 18
spark.mesos.coarse = true
spark.executor.memory = 4g
    
The job logic will start 6 Kafka receivers.

#1
--
|| Node || Mesos CPU || Mesos Mem || Spark tasks || Streaming receivers ||
| 1 | 4 |  4GB | 3  | 2  |
| 2 | 6 |  4GB | 2  | 1  | 
| 3 | 7 | 4GB  | 3  | 2  |
| 4 | 1 | 4GB | 1 | 1 |

Total mem: 16 GB
Total CPUs: 18

Observations: 
Node#4 with only 1 CPU and 1 Kafka receiver does not have capacity to process 
the received data, so all data received needs to be sent to other node for 
non-local processing  (not sure how replication helps or not in this case, the 
blocks of data are processed on other nodes). Also the nodes with 2 streaming 
receivers have higher load that the node with 1 receiver.

#2
--
|| Node || Mesos CPU || Mesos Mem || Spark tasks || Streaming receivers ||
| 1 | 7 |  4GB | 7  | 4  |
| 2 | 2 |  4GB | 2  | 2  | 

Total mem: 8 GB
Total CPUs: 9

Observations: 
This is the worst configuration of the day. Totally unbalanced (4 vs 2 
receivers) and for some reason, the job didn't get all the resources assigned 
in the configuration. The job processing time is also slower as there're less 
cores to handle the data and less overall memory.

#3
--
|| Node || Mesos CPU || Mesos Mem || Spark tasks || Streaming receivers ||
| 1 | 3 |  4GB | 3  | 2  |
| 2 | 8 |  4GB | 2  | 2  | 
| 3 | 7 | 4GB  | 3  | 2  |

Total mem: 12GB
Total CPU: 18

Observations: 
This is a fairly good configuration with a more evenly distributed receivers 
and CPUs although there's one  considerable smaller node in terms of CPU 
assignment.
 
We can observe that the current resource assignment policy results in less than 
ideal and in particular random assignments that have a strong impact on the job 
execution and performance. Given that CPU allocation is by executor (and not by 
job), makes total memory for the job variable as it can get 2 to 4 executors 
assigned. It's also weird and unexpected to observe less than max CPU 
allocations.
Here's a performance chart of the same job jumping from one config to another 
(*), one with 3 (left) nodes and one with 2 (right): 
!https://lh3.googleusercontent.com/Z1C71OKoQzGA13uNJ8Yvf_xz_glRUqU_IGGvLsfkPvUPK2lahrEatweiWl-PDDfysjXtbs1Sl_k=w1682-h689!
(chart line: processing time in ms, load is fairly constant)

(*) for some reason we didn't find yet, Mesos often kills the job. When 
Marathon relaunches it, it results in a different resource assignment.


was (Author: gmaas):
Hi Tim,

We are indeed using Coarse Grain mode. I'm not sure fine-grained mode makes 
much sense for Spark Streaming.

Here're few examples of resource allocation. They are taken from several runs 
of the same job with identical configuration:
Job config:
spark.cores.max = 18
spark.mesos.coarse = true
spark.executor.memory = 4g
    
The job logic will start 6 Kafka receivers.

#1
--
|| Node || Mesos CPU || Mesos Mem || Spark tasks || Streaming receivers ||
| 1 | 4 |  4GB | 3  | 2  |
| 2 | 6 |  4GB | 2  | 1  | 
| 3 | 7 | 4GB  | 3  | 2  |
| 4 | 1 | 4GB | 1 | 1 |

Total mem: 16 GB
Total CPUs: 18

Observations: 
Node#4 with only 1 CPU and 1 Kafka receiver does not have capacity to process 
the received data, so all data received needs to be sent to other node for 
non-local processing  (not sure how replication helps or not in this case, the 
blocks of data are processed on other nodes). Also the nodes with 2 streaming 
receivers have higher load that the node with 1 receiver.

#2
--
|| Node || Mesos CPU || Mesos Mem || Spark tasks || Streaming receivers ||
| 1 | 7 |  4GB | 7  | 4  |
| 2 | 2 |  4GB | 2  | 2  | 

Total mem: 8 GB
Total CPUs: 9

Observations: 
This is the worst configuration of the day. Totally unbalanced (4 vs 2 
receivers) and for some reason, the job didn't get all the resources assigned 
in the configuration. The job processing time is also slower as there're less 
cores to handle the data and less overall memory.

#3
--
|| Node || Mesos CPU || Mesos Mem || Spark tasks || Streaming receivers ||
| 1 | 3 |  4GB | 3  | 2  |
| 2 | 8 |  4GB | 2  | 2  | 
| 3 | 7 | 4GB  | 3  | 2  |

Total mem: 12GB
Total CPU: 18

Observations: 
This is a fairly good configuration with a more evenly distributed receivers 
and CPUs although there's one  considerable smaller node in terms of CPU 
assignment.
 
We can observe that the current resource assignment policy results in less than 
ideal and in particular random assignments that have a strong impact on the job 
execution and performance. Given that CPU allocation is by executor (and not by 
job), makes total memory for the job variable as it can get 2 to 4 executors 
assigned. It's also weird and unexpected to observe less than max CPU 
allocations.
Here's a performance chart of the same job jumping from one config to another 
(*), one with 3 (left) nodes and one with 2 (right): 
!https://lh3.googleusercontent.com/Z1C71OKoQzGA13uNJ8Yvf_xz_glRUqU_IGGvLsfkPvUPK2lahrEatweiWl-PDDfysjXtbs1Sl_k=w1682-h689!
(chart line: processing time in ms, load is fairly constant)

(*) for some reason we didn't find yet, Mesos often kills the job. When 
Marathon relaunches it, it results in a different resource assignment.

> Support more evenly distributing cores for Mesos mode
> -----------------------------------------------------
>
>                 Key: SPARK-4940
>                 URL: https://issues.apache.org/jira/browse/SPARK-4940
>             Project: Spark
>          Issue Type: Improvement
>          Components: Mesos
>            Reporter: Timothy Chen
>
> Currently in Coarse grain mode the spark scheduler simply takes all the 
> resources it can on each node, but can cause uneven distribution based on 
> resources available on each slave.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to