[ https://issues.apache.org/jira/browse/HADOOP-5964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy updated HADOOP-5964: ---------------------------------- Attachment: HADOOP-5964_8_20090618.patch Thanks for the review Hemanth - as you pointed out the patch needs a bit more work to remove logging etc. I'm attaching a patch which incorporates your feedback. Some clarifications: {quote} TaskTrackerStatus: * countOccupiedMapSlots: the check for whether a task is running, based on it's status, seems complicated enough to move to an API that can be called from both countMapTasks and this API. This way, any changes to it will cause the right behavior for both APIs. Likewise, for reduces. mapreduce.TaskTracker: * reserveSlots: java doc refers to reserving on 'map' slots. * Why do we need to maintain a count of slots reserved (numFallowMapSlots). I see that the accessor API is not used anywhere. {quote} Fixed. bq. * Why are we reserving available slots on the tasktracker. Shouldn't we always be reserving only how much this job requires ? In that case, do we need a re-reservation ? We reserve all available slots since by definition all of them are for the same task, else we wouldn't reserve if we could run right away. We need 're-reservation' since #reserved-slots (on the same tasktracker) might change over time and we need to track these for metering (JobCounter.FALLOW_SLOTS_MILLIS_{MAPS|REDUCES}). bq. * When we try to get a task for a job ignoring user limits (i.e. if the cluster is free), we are not reserving TTs. Is this by design ? Also, is it for the same reason that we are not checking for user limits when assigning a task to a reserved TT ? Yes. bq. * Lets not pass the scheduler instance to the poller. I think it only needs the number of map slots and reduce slots. We can pass just that much. We've seen in the past that passing entire objects like the scheduler makes testing classes difficult. Also, not all information is required. Done. I've added a JobInitializationPoller.JobInitializationContext and use that rather than the passing the scheduler. {quote} JobTracker: * When a job is killed, we are not clearing reserved trackers for this job. * Likewise, when a TT is blacklisted do we need to remove the reservations ? {quote} My bad. Thanks for catching this. Fixed. bq. It seems like the changes in JobTracker can be reduced a little if we do not change APIs that are passed a TTstatus object or a tasktracker name. We can still change the maps to be built of TaskTracker objects, but retrieve the status wherever necessary and pass it to methods. This way the changes may be fewer and easier to verify. For e.g. I think this is possible in the ExpireTrackers class. I really don't think it's a good idea to use both TaskTracker and TaskTrackerStatus in the long run, it's really hard to maintain. Which is why I bit the bullet and changed all of them. > Fix the 'cluster drain' problem in the Capacity Scheduler wrt High RAM Jobs > --------------------------------------------------------------------------- > > Key: HADOOP-5964 > URL: https://issues.apache.org/jira/browse/HADOOP-5964 > Project: Hadoop Core > Issue Type: Bug > Components: contrib/capacity-sched > Affects Versions: 0.20.0 > Reporter: Arun C Murthy > Assignee: Arun C Murthy > Fix For: 0.21.0 > > Attachments: HADOOP-5964_0_20090602.patch, > HADOOP-5964_1_20090608.patch, HADOOP-5964_2_20090609.patch, > HADOOP-5964_4_20090615.patch, HADOOP-5964_6_20090617.patch, > HADOOP-5964_7_20090618.patch, HADOOP-5964_8_20090618.patch > > > When a HighRAMJob turns up at the head of the queue, the current > implementation of support for HighRAMJobs in the Capacity Scheduler has > problem in that the scheduler stops assigning tasks to all TaskTrackers in > the cluster until a HighRAMJob finds a suitable TaskTrackers for all its > tasks. > This causes a severe utilization problem since effectively no new tasks are > allowed to run until the HighRAMJob (at the head of the queue) gets slots. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.