reduce overhead of sorting jobs/pools in FairScheduler heartbeat processing
---------------------------------------------------------------------------

                 Key: MAPREDUCE-2048
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2048
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: contrib/fair-share
            Reporter: Joydeep Sen Sarma


We are bound on the JT by the jobtracker lock. Sorting of jobs (and pools in 
hadoop-trunk) done by the FairScheduler is done once per heartbeat while this 
lock is held. This shows up as one of the places where we spend a lot of time 
holding the jobtracker lock.

We can avoid sorting the jobs/pools per heartbeat - and instead do a sort in 
the updateThread (which is invoked periodically). The sorted set can be 
maintained incrementally (as jobs/pools are scheduled in each heartbeat - one 
can delete/insert into the sortedset).

This may be less of an issue in trunk (as we sort pools and then sort jobs 
within a pool) as opposed to hadoop-20 (where we sort all jobs). however - in 
our workload - we have lots of pools (one per user) and lots of jobs in some 
pools (production pools) - so i think it's reasonable to assume that this is 
worth addressing in trunk as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to