[ https://issues.apache.org/jira/browse/HADOOP-5964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721321#action_12721321 ]
Hemanth Yamijala commented on HADOOP-5964: ------------------------------------------ I am looking at this patch as comprising of three separate parts: - Changes to the scheduler for fixing the under utilization problem in the face of high RAM jobs - The new TaskTracker class, its lifecycle and changes in JobTracker to support this. - The changes on the old TaskTracker class to account for number of slots. I've currently done the first and partly the second part. Some comments so far: TaskTrackerStatus: - countOccupiedMapSlots: the check for whether a task is running, based on it's status, seems complicated enough to move to an API that can be called from both countMapTasks and this API. This way, any changes to it will cause the right behavior for both APIs. Likewise, for reduces. mapreduce.TaskTracker: - reserveSlots: java doc refers to reserving on 'map' slots. - Why do we need to maintain a count of slots reserved (numFallowMapSlots). I see that the accessor API is not used anywhere. CapacityTaskScheduler: - Why are we reserving available slots on the tasktracker. Shouldn't we always be reserving only how much this job requires ? In that case, do we need a re-reservation ? - When we try to get a task for a job ignoring user limits (i.e. if the cluster is free), we are not reserving TTs. Is this by design ? Also, is it for the same reason that we are not checking for user limits when assigning a task to a reserved TT ? JobConf: - Since computeNumSlotsPerMap is used only by CapacityScheduler right now, should we just leave this computation out of JobConf ? JobInitializationPoller: - Lets not pass the scheduler instance to the poller. I think it only needs the number of map slots and reduce slots. We can pass just that much. We've seen in the past that passing entire objects like the scheduler makes testing classes difficult. Also, not all information is required. JobTracker: - When a job is killed, we are not clearing reserved trackers for this job. - Likewise, when a TT is blacklisted do we need to remove the reservations ? - It seems like the changes in JobTracker can be reduced a little if we do not change APIs that are passed a TTstatus object or a tasktracker name. We can still change the maps to be built of TaskTracker objects, but retrieve the status wherever necessary and pass it to methods. This way the changes may be fewer and easier to verify. For e.g. I think this is possible in the ExpireTrackers class. Some nits: - In some places, formatting more than 80 characters in a line. (E.g. mapreduce.TaskTracker.java) - There are lot of LOG.info statements, possibly to enable testing / debugging. Can you please remove these ? - Fallow seems a complicated word to understand. Is 'Reserved' good enough ? Will continue with the review... > Fix the 'cluster drain' problem in the Capacity Scheduler wrt High RAM Jobs > --------------------------------------------------------------------------- > > Key: HADOOP-5964 > URL: https://issues.apache.org/jira/browse/HADOOP-5964 > Project: Hadoop Core > Issue Type: Bug > Components: contrib/capacity-sched > Affects Versions: 0.20.0 > Reporter: Arun C Murthy > Assignee: Arun C Murthy > Fix For: 0.21.0 > > Attachments: HADOOP-5964_0_20090602.patch, > HADOOP-5964_1_20090608.patch, HADOOP-5964_2_20090609.patch, > HADOOP-5964_4_20090615.patch, HADOOP-5964_6_20090617.patch, > HADOOP-5964_7_20090618.patch > > > When a HighRAMJob turns up at the head of the queue, the current > implementation of support for HighRAMJobs in the Capacity Scheduler has > problem in that the scheduler stops assigning tasks to all TaskTrackers in > the cluster until a HighRAMJob finds a suitable TaskTrackers for all its > tasks. > This causes a severe utilization problem since effectively no new tasks are > allowed to run until the HighRAMJob (at the head of the queue) gets slots. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.