[ 
https://issues.apache.org/jira/browse/SPARK-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092807#comment-14092807
 ] 

Mridul Muralidharan commented on SPARK-2962:
--------------------------------------------

On further investigation :

a) The primary issue is a combination of SPARK-2089 and current schedule 
behavior for pendingTasksWithNoPrefs.
SPARK-2089 leads to very bad allocation of nodes - particularly has an impact 
on bigger clusters.
It leads to a lot of block having no data or rack local executors - causing 
them to end up in pendingTasksWithNoPrefs.

While loading data off dfs, when an executor is being scheduled, even though 
there might be rack local schedules available for it (or, on waiting a while, 
data local too - see (b) below), because of current scheduler behavior, tasks 
from pendingTasksWithNoPrefs get scheduled : causing a large number of ANY 
tasks to be scheduled at the very onset.

The combination of these, with lack of marginal alleviation via (b) is what 
caused the performance impact.

b) spark.scheduler.minRegisteredExecutorsRatio was not yet been used in the 
workload - so that might alleviate some of the non deterministic waiting and 
ensuring adequate executors are allocated ! Thanks [~lirui]



> Suboptimal scheduling in spark
> ------------------------------
>
>                 Key: SPARK-2962
>                 URL: https://issues.apache.org/jira/browse/SPARK-2962
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.1.0
>         Environment: All
>            Reporter: Mridul Muralidharan
>
> In findTask, irrespective of 'locality' specified, pendingTasksWithNoPrefs 
> are always scheduled with PROCESS_LOCAL
> pendingTasksWithNoPrefs contains tasks which currently do not have any alive 
> locations - but which could come in 'later' : particularly relevant when 
> spark app is just coming up and containers are still being added.
> This causes a large number of non node local tasks to be scheduled incurring 
> significant network transfers in the cluster when running with non trivial 
> datasets.
> The comment "// Look for no-pref tasks after rack-local tasks since they can 
> run anywhere." is misleading in the method code : locality levels start from 
> process_local down to any, and so no prefs get scheduled much before rack.
> Also note that, currentLocalityIndex is reset to the taskLocality returned by 
> this method - so returning PROCESS_LOCAL as the level will trigger wait times 
> again. (Was relevant before recent change to scheduler, and might be again 
> based on resolution of this issue).
> Found as part of writing test for SPARK-2931
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to