reduce overhead of sorting jobs/pools in FairScheduler heartbeat processing
---------------------------------------------------------------------------
Key: MAPREDUCE-2048
URL: https://issues.apache.org/jira/browse/MAPREDUCE-2048
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: contrib/fair-share
Reporter: Joydeep Sen Sarma
We are bound on the JT by the jobtracker lock. Sorting of jobs (and pools in
hadoop-trunk) done by the FairScheduler is done once per heartbeat while this
lock is held. This shows up as one of the places where we spend a lot of time
holding the jobtracker lock.
We can avoid sorting the jobs/pools per heartbeat - and instead do a sort in
the updateThread (which is invoked periodically). The sorted set can be
maintained incrementally (as jobs/pools are scheduled in each heartbeat - one
can delete/insert into the sortedset).
This may be less of an issue in trunk (as we sort pools and then sort jobs
within a pool) as opposed to hadoop-20 (where we sort all jobs). however - in
our workload - we have lots of pools (one per user) and lots of jobs in some
pools (production pools) - so i think it's reasonable to assume that this is
worth addressing in trunk as well.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.