[ 
https://issues.apache.org/jira/browse/HADOOP-4035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648541#action_12648541
 ] 

Vivek Ratan commented on HADOOP-4035:
-------------------------------------

As Vinod has brought up, there are some edge cases and details missing in [the 
summary|https://issues.apache.org/jira/browse/HADOOP-4035?focusedCommentId=12644267#action_12644267]
 that we need to cover. 

We want monitoring to work independent of scheduler support, i.e., even if the 
scheduler you're using does not support memory-based scheduling, you may still 
want to make sure the TTs monitor memory usage on their machines and kill tasks 
if too much memory is used. Based on what we've described in the summary, the 
following three configuration settings are required for the TT to do 
monitoring: {{mapred.tasktracker.virtualmemory.reserved}} (the offset of total 
VM on the machine), {{mapred.task.default.maxvm}} (the default for maximum VM 
per task), and {{mapred.task.limit.maxvm}} (the upper limit on the max VM per 
task). It is proposed that: 
* if one or more of these three values are missing in the configuration, the TT 
disables monitoring and logs an appropriate message. 
* At startup, the TT should also make sure that {{mapred.task.default.maxvm}} 
is not greater than {{mapred.task.limit.maxvm}}. If it is, the TT logs a 
message and disables monitoring.
* if all three are present, the TT has enough information to compute the 
_max-VM-per-task_ limit for each task it runs and can successfully monitor 
memory usage.
* Without scheduler support, the TT can get a task whose _max-VM-per-task_ 
limit is higher than {{mapred.task.limit.maxvm}} (i.e., the user-set value for 
a job's {{mapred.task.maxvm}} can be higher than {{mapred.task.limit.maxvm}}). 
In such a case, the TT can choose to fail the task, or it may still run the 
task while logging the problem. IMO, the former seems too harsh and not 
something that the TT should possibly decide just based on its settings for 
monitoring. In the latter case, the TT can still continue monitoring, but may 
end up killing the wrong task if the sum of VMs used is over the 
_max-VM-per-node_ limit. I propose we do the latter. 

The TT also needs to report memory information to the schedulers. As per 
HADOOP-3759, TTs currently report, in each heartbeat, how much free VM they 
have (which is equal to _max-VM-per-node_ minus the sum of _max-VM-per-task_ 
for each running task). This makes sense if monitoring is on, and the three 
necessary VM config values are defined. If they're not, and the TT cannot 
determine its free VM, what should it report? 
* It can report -1, or some such value, indicating that it cannot compute free 
VM. 
* If we let schedulers decide how they want to behave in the absence of 
monitoring, or rather in the absence of the  necessary VM config values being 
defined, a TT should always report how much total VM (as well as RAM) it has, 
as well as its value for {{mapred.tasktracker.virtualmemory.reserved}}. 

I propose the latter. TTs always report how much VM&RAM they have on their 
system, and what offset settings they have. They're the only ones who have this 
information, and this approach gives a lot of flexibility to the schedulers in 
terms of how to use that information. 

What about schedulers? The Capacity Scheduler should do the following: 
* If any of the three mandatory VM settings are not set, it should not schedule 
based on VM or RAM. The value of {{mapred.tasktracker.virtualmemory.reserved}} 
comes from the TT while the other two can be read by the scheduler from its own 
config file. 
* If the mandatory VM values are set, as well as the mandatory RAM values 
({{mapred.capacity-scheduler.default.ramlimit}}, {{mapred.task.limit.maxram}}), 
the scheduler uses both VM and RAM settings to schedule, as defined in the 
earlier 
[summary|https://issues.apache.org/jira/browse/HADOOP-4035?focusedCommentId=12644267#action_12644267].
 
* If the mandatory VM values are set, but one or more of the mandatory RAM 
values are not, the scheduler only uses VM values for scheduling. 

It's possible that other schedulers may choose a different algorithm. What's 
important is that they have all the available information, which they should as 
per this proposal. 


> Modify the capacity scheduler (HADOOP-3445) to schedule tasks based on memory 
> requirements and task trackers free memory
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4035
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4035
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.19.0
>            Reporter: Hemanth Yamijala
>            Assignee: Vinod K V
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: 4035.1.patch, HADOOP-4035-20080918.1.txt, 
> HADOOP-4035-20081006.1.txt, HADOOP-4035-20081006.txt, HADOOP-4035-20081008.txt
>
>
> HADOOP-3759 introduced configuration variables that can be used to specify 
> memory requirements for jobs, and also modified the tasktrackers to report 
> their free memory. The capacity scheduler in HADOOP-3445 should schedule 
> tasks based on these parameters. A task that is scheduled on a TT that uses 
> more than the default amount of memory per slot can be viewed as effectively 
> using more than one slot, as it would decrease the amount of free memory on 
> the TT by more than the default amount while it runs. The scheduler should 
> make the used capacity account for this additional usage while enforcing 
> limits, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to