[
https://issues.apache.org/jira/browse/MAPREDUCE-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778231#action_12778231
]
Vinod K V commented on MAPREDUCE-1018:
--------------------------------------
Already so many other mapreduce issues have only modified cluster-setup.xml,
the one in mapreduce project. Rahul mentioned offline that forrest
documentation is not getting generated in mapreduce sub-project. Assuming we'll
address that in a separate issue, I propose we have only one patch - the mapred
one.
- The mapred Patch has git prefixes which need to be removed
- Monitoring/Scheduling based on RAM is completely removed. So remove the
references too. Just add a note saying that (quoting from HADOOP-5881) there
isn't any need for distinguishing vmem from physical memory w.r.t
configuration. Depending on a site's requirements, the configuration items can
reflect whether one wants tasks to go beyond physical memory or not.
cluster_setup.html
- All config names should be renamed to the new names. Of-course this means a
slightly different patch for 0.20 - which we will come to after the patch
for trunk's done
- mapred.{map|reduce}.child.ulimit also need to be renamed
- What happens when monitoring is enabled, but job has -1?
- Memory-monitoring is no longer defined in terms of per-task-limit and
per-node-limit. It is now driven by per-slot-size and number of slots. We
should use these new terms through-out.
- "Before getting into details, consider the following additional
memory-related parameters than can be configured to enable better scheduling:"\
The above line is no longer needed.
capacity_scheduler.html
- Feature for monitoring RAM no more. Remove all references.
- Working of scheduling
-- Point 1: 4 parameters, not three. Parameters described in cluster_setup.
vmem.reserved no more used.
-- Point 2: This is changed completely. No more offsets.
Total = numSlots * PerSlotMemSize.
Used = Sigma(numSlotsPerTask * PerSlotMemSize)
-- Point 3: JT now rejects the jobs, not the scheduler.
- "See the MapReduce Tutorial for details on how the TT monitors memory usage."
"See cluster_setup" instead?
- Need to update mapred_tutorial.html's memory management section. Aslo need a
reference to this in both cluster_setup.html as well as capacity_scheduler.html
- Another point I've already mentioned on the JIRA.
"Along with everything else, we should document that job setup and job cleanup
tasks of all jobs, either requiring or not requiring high memory for their maps
and reduces, still run on a single slot.
> Document changes to the memory management and scheduling model
> --------------------------------------------------------------
>
> Key: MAPREDUCE-1018
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1018
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: documentation
> Affects Versions: 0.21.0
> Reporter: Hemanth Yamijala
> Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: MAPRED-1018-1.patch, MAPRED-1018-2.patch,
> MAPRED-1018-commons.patch
>
>
> There were changes done for the configuration, monitoring and scheduling of
> high ram jobs. This must be documented in the mapred-defaults.xml and also on
> forrest documentation
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.