[
https://issues.apache.org/jira/browse/MAPREDUCE-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788508#action_12788508
]
Vinod K V commented on MAPREDUCE-1018:
--------------------------------------
Looked at the latest patch. I've some more comments. Some of the following may
not have been introduced by your patch though.
h4.cluster_setup.html
- Statement I: _Users can, optionally, specify the MEM task-limit per job. If
no such limit is provided, a default limit is used. A node-limit can be set per
node._
We don't have default limits anymore. So the statement _"If no such limit is
provided, a default limit is used."_ can be removed.
- Node-limit cannot be set directly anymore. So, we should define the node
limit here by saying _"Node-limit of total memory usage for tasks is given by
Node-limit = mapreduce.tasktracker.map.tasks.maximum *
mapreduce.cluster.mapmemory.mb + mapreduce.tasktracker.reduce.tasks.maximum *
mapreduce.cluster.mapmemory.mb"_.
- _To enable monitoring for a TT, the following parameters all need to be set:_
This is not true w.r.t job configuration. So this table should only have TT
configuration. And move job configuration to another table. We should move the
above statement to later and thus the whole section would look like
{quote}
To enable monitoring for a TT, the following parameters all need to be set:
TABLE:I with TT parameters
Node-limit of total memory usage for tasks is given by Node-limit =
mapreduce.tasktracker.map.tasks.maximum * mapreduce.cluster.mapmemory.mb +
mapreduce.tasktracker.reduce.tasks.maximum * mapreduce.cluster.mapmemory.mb"
Users can, optionally, specify the MEM task-limit per job.
TABLE II with Job parameters
{quote}
- _2. Periodically, the TT checks the following:_
Should be "If memory monitoring is enabled, the TT does the following
periodically:"
h4. mapred_tutorial.html:
- Memory management section defines the job parameters again here. But
"mapreduce.map.memory.mb" is repeated twice, one of them should be
"mapreduce.reduce.memory.mb"
- _Users can choose to override default limits of memory enforced by the task
tracker, if memory management is enabled. Users can set the following parameter
per job:_
Please modify "memory management" to be "memory monitoring" and link it to the
monitoring section in cluster_setup.html
h4. capacity_scheduler.html:
- Please rename the section name to "Memory-based task-scheduling"
- _The Capacity Scheduler supports scheduling of tasks on a TaskTracker(TT)
based on a job's memory requirements and the availability of RAM and Virtual
Memory (VMEM) on the TT node._
As my previous review comments mentioned, support for RAM availability is no
longer there. So it should read _"....and the availability of Virtual Memory
(VMEM) on the TT node. There isn't any need for distinguishing VMEM from
physical memory w.r.t tasks. Depending on a site's requirements, the
configuration can be set depending on whether one wants tasks to go beyond
physical memory or not."_
- _"See the MapReduce Tutorial for details on how the TT monitors memory
usage."_
This should actually point to cluster_setup.html. Previous review comment
missed.
- _"Currently the memory based scheduling is only supported in Linux
platform."_
This isn't quite right. It should be _"Memory based scheduling primarily
exists to avoid memory pressure by tasks on a TT and thus is dependent on
TT-memory monitoring which currently is only supported in Linux platform."_
- _"1. The absence of any one or more of four config parameters or -1 being
set as value of any of the parameters, mapreduce.cluster.mapmemory.mb,
mapreduce.cluster.reducememory.mb, or mapreduce.jobtracker.maxmapmemory.mb,
mapreduce.jobtracker.maxreducememory.mb disables memory-based scheduling, just
as it disables memory monitoring for a TT. These config parameters are
described in the MapReduce Tutorial. "_
This can be greatly simplified to _"The configuration properties
mapreduce.cluster.mapmemory.mb, mapreduce.cluster.reducememory.mb, or
mapreduce.jobtracker.maxmapmemory.mb, mapreduce.jobtracker.maxreducememory.mb
are used to enable/disable memory based scheduling. The absence of being set as
-1 of any one of these properties disables memory-based scheduling, just as it
disables monitoring for a TT. These parameters are described in the
Cluster-setup <a href="cluster_setup.html#memory_monitoring">memory-monitoring
section</a>."_
- The second statement that describes scheduling can be greatly simplified by
writing it as a list of points, like my previous review described. Also, we
haven't introduces reservations anywhere else, so that part also needs to be
explained so and clear. Roughly,
{quote}
* Point 2 in Working of scheduling
Total = numSlots * PerSlotMemSize.
Used = Sigma(numSlotsPerTask * PerSlotMemSize)
if (can fit), schedule. Otherwise reserve. Reserve why? what? How many?
{quote}
That was just a rough cut, but should give you an idea
> Document changes to the memory management and scheduling model
> --------------------------------------------------------------
>
> Key: MAPREDUCE-1018
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1018
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: documentation
> Affects Versions: 0.21.0
> Reporter: Hemanth Yamijala
> Assignee: rahul k singh
> Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: MAPRED-1018-1.patch, MAPRED-1018-2.patch,
> MAPRED-1018-3.patch, MAPRED-1018-commons.patch
>
>
> There were changes done for the configuration, monitoring and scheduling of
> high ram jobs. This must be documented in the mapred-defaults.xml and also on
> forrest documentation
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.