[
https://issues.apache.org/jira/browse/HADOOP-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689061#action_12689061
]
Vinod K V commented on HADOOP-5568:
-----------------------------------
The primary observation while testing TaskMemoryManager is that it is not able
to prevent nodes from going down when rogue tasks start consuming memory. It
currently does the following:
- It monitors the memory usage of each task (the task jvm and the descendant
processes), and makes sure that the task is failed if the task goes beyond its
memory reqs(specified via mapred.task.maxvmem).
- Further, it also monitors the memory usage of all tasks running on a TT and
makes sure that cumulative memory usage doesn't cross a specific limit (Total
TT Vmem less mapred.tasktracker.vmem.reserved) by killing the least-progress
tasks to bring down the memory usage.
The per-task monitoring is working fine with tasks growing at a moderate rate
of till/around 100MB/sec. There are problems with the cumulative-usage
monitoring.
- The limit mapred.task.limit.maxvmem is supposed originally to prevent jobs
from asking too much memory. If a single task asks for memory nearing the total
usable Vmem on the TT, we don't prevent the task from running and as of now
just log at warn level in the TT if it crosses mapred.task.limit.maxvmem. This
is very problematic without any support for memory-based scheduling as tasks
can potentially bring down nodes and we have seen instances of this.
- Even if the tasks are withing limits, as mapred.task.limit.maxvmem is really
not enforced, cumulative usage near total usable Vmem on the TT brings down the
node and we have seen instances of this too.
> TaskMemoryManager not enforcing memory limits in the presence of rogue tasks
> ----------------------------------------------------------------------------
>
> Key: HADOOP-5568
> URL: https://issues.apache.org/jira/browse/HADOOP-5568
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Reporter: Vinod K V
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.