[ 
https://issues.apache.org/jira/browse/SPARK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16300926#comment-16300926
 ] 

Xuefu Zhang commented on SPARK-22765:
-------------------------------------

Did some benchmarking with a set of 20 queries on upfront allocation against 
exponential ramp-up. No clear trend is seen: upfront allocation offers better 
efficiency for some of the queries, similar efficiency for others, and worse 
for the rest. These variations might just be noise, which is abundant in our 
production cluster. Thus, I tend to agree that upfront allocation offers 
limited benefit for efficiency, if any. (On the other hand, it seems benefiting 
performance somewhat.)

I also noticed that when the scheduler schedules a task, it doesn't necessarily 
pick a core that's available in an executor that's running other tasks. I 
speculate that efficiency improves if busy executors are favored for a new task 
so that other idle executors can idle out. (To be tested out.)

Making idleTime=0 valid is a good thing to have. I will create a separate 
ticket for that.


> Create a new executor allocation scheme based on that of MR
> -----------------------------------------------------------
>
>                 Key: SPARK-22765
>                 URL: https://issues.apache.org/jira/browse/SPARK-22765
>             Project: Spark
>          Issue Type: Improvement
>          Components: Scheduler
>    Affects Versions: 1.6.0
>            Reporter: Xuefu Zhang
>
> Many users migrating their workload from MR to Spark find a significant 
> resource consumption hike (i.e, SPARK-22683). While this might not be a 
> concern for users that are more performance centric, for others conscious 
> about cost, such hike creates a migration obstacle. This situation can get 
> worse as more users are moving to cloud.
> Dynamic allocation make it possible for Spark to be deployed in multi-tenant 
> environment. With its performance-centric design, its inefficiency has also 
> unfortunately shown up, especially when compared with MR. Thus, it's believed 
> that MR-styled scheduler still has its merit. Based on our research, the 
> inefficiency associated with dynamic allocation comes in many aspects such as 
> executor idling out, bigger executors, many stages (rather than 2 stages only 
> in MR) in a spark job, etc.
> Rather than fine tuning dynamic allocation for efficiency, the proposal here 
> is to add a new, efficiency-centric  scheduling scheme based on that of MR. 
> Such a MR-based scheme can be further enhanced and be more adapted to Spark 
> execution model. This alternative is expected to offer good performance 
> improvement (compared to MR) still with similar to or even better efficiency 
> than MR.
> Inputs are greatly welcome!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to