Thomas Graves created SPARK-21695:

             Summary: Spark scheduler locality algorithm can take longer then 
                 Key: SPARK-21695
             Project: Spark
          Issue Type: Bug
          Components: Scheduler
    Affects Versions: 2.1.0
            Reporter: Thomas Graves

Reference jira

I'm seeing an issue with some jobs where the scheduler takes a long time to 
schedule tasks on executors.   The default locality wait is 3 seconds so I was 
expecting that an executor should get some task on it in max 9 seconds (node 
local, rack local, any), but its taking way more time then that.  In the case 
of spark-21656 it takes 60+ seconds and executors idle timeout.  

We should investigate why and see if we can fix this.

Upon an initial look it seems the scheduler resets the locality lastLaunchTime 
whenever it places any task on a node at that locality level. It appears this 
means it can take way longer then 3 seconds for any particular task to fall 
back, but this needs to be verified.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to