[ https://issues.apache.org/jira/browse/HADOOP-4035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648541#action_12648541 ]
Vivek Ratan commented on HADOOP-4035: ------------------------------------- As Vinod has brought up, there are some edge cases and details missing in [the summary|https://issues.apache.org/jira/browse/HADOOP-4035?focusedCommentId=12644267#action_12644267] that we need to cover. We want monitoring to work independent of scheduler support, i.e., even if the scheduler you're using does not support memory-based scheduling, you may still want to make sure the TTs monitor memory usage on their machines and kill tasks if too much memory is used. Based on what we've described in the summary, the following three configuration settings are required for the TT to do monitoring: {{mapred.tasktracker.virtualmemory.reserved}} (the offset of total VM on the machine), {{mapred.task.default.maxvm}} (the default for maximum VM per task), and {{mapred.task.limit.maxvm}} (the upper limit on the max VM per task). It is proposed that: * if one or more of these three values are missing in the configuration, the TT disables monitoring and logs an appropriate message. * At startup, the TT should also make sure that {{mapred.task.default.maxvm}} is not greater than {{mapred.task.limit.maxvm}}. If it is, the TT logs a message and disables monitoring. * if all three are present, the TT has enough information to compute the _max-VM-per-task_ limit for each task it runs and can successfully monitor memory usage. * Without scheduler support, the TT can get a task whose _max-VM-per-task_ limit is higher than {{mapred.task.limit.maxvm}} (i.e., the user-set value for a job's {{mapred.task.maxvm}} can be higher than {{mapred.task.limit.maxvm}}). In such a case, the TT can choose to fail the task, or it may still run the task while logging the problem. IMO, the former seems too harsh and not something that the TT should possibly decide just based on its settings for monitoring. In the latter case, the TT can still continue monitoring, but may end up killing the wrong task if the sum of VMs used is over the _max-VM-per-node_ limit. I propose we do the latter. The TT also needs to report memory information to the schedulers. As per HADOOP-3759, TTs currently report, in each heartbeat, how much free VM they have (which is equal to _max-VM-per-node_ minus the sum of _max-VM-per-task_ for each running task). This makes sense if monitoring is on, and the three necessary VM config values are defined. If they're not, and the TT cannot determine its free VM, what should it report? * It can report -1, or some such value, indicating that it cannot compute free VM. * If we let schedulers decide how they want to behave in the absence of monitoring, or rather in the absence of the necessary VM config values being defined, a TT should always report how much total VM (as well as RAM) it has, as well as its value for {{mapred.tasktracker.virtualmemory.reserved}}. I propose the latter. TTs always report how much VM&RAM they have on their system, and what offset settings they have. They're the only ones who have this information, and this approach gives a lot of flexibility to the schedulers in terms of how to use that information. What about schedulers? The Capacity Scheduler should do the following: * If any of the three mandatory VM settings are not set, it should not schedule based on VM or RAM. The value of {{mapred.tasktracker.virtualmemory.reserved}} comes from the TT while the other two can be read by the scheduler from its own config file. * If the mandatory VM values are set, as well as the mandatory RAM values ({{mapred.capacity-scheduler.default.ramlimit}}, {{mapred.task.limit.maxram}}), the scheduler uses both VM and RAM settings to schedule, as defined in the earlier [summary|https://issues.apache.org/jira/browse/HADOOP-4035?focusedCommentId=12644267#action_12644267]. * If the mandatory VM values are set, but one or more of the mandatory RAM values are not, the scheduler only uses VM values for scheduling. It's possible that other schedulers may choose a different algorithm. What's important is that they have all the available information, which they should as per this proposal. > Modify the capacity scheduler (HADOOP-3445) to schedule tasks based on memory > requirements and task trackers free memory > ------------------------------------------------------------------------------------------------------------------------ > > Key: HADOOP-4035 > URL: https://issues.apache.org/jira/browse/HADOOP-4035 > Project: Hadoop Core > Issue Type: Bug > Components: contrib/capacity-sched > Affects Versions: 0.19.0 > Reporter: Hemanth Yamijala > Assignee: Vinod K V > Priority: Blocker > Fix For: 0.20.0 > > Attachments: 4035.1.patch, HADOOP-4035-20080918.1.txt, > HADOOP-4035-20081006.1.txt, HADOOP-4035-20081006.txt, HADOOP-4035-20081008.txt > > > HADOOP-3759 introduced configuration variables that can be used to specify > memory requirements for jobs, and also modified the tasktrackers to report > their free memory. The capacity scheduler in HADOOP-3445 should schedule > tasks based on these parameters. A task that is scheduled on a TT that uses > more than the default amount of memory per slot can be viewed as effectively > using more than one slot, as it would decrease the amount of free memory on > the TT by more than the default amount while it runs. The scheduler should > make the used capacity account for this additional usage while enforcing > limits, etc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.