[jira] Commented: (MAPREDUCE-1018) Document changes to the memory management and scheduling model

Vinod K V (JIRA) Sun, 15 Nov 2009 23:01:07 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778231#action_12778231
 ]


Vinod K V commented on MAPREDUCE-1018:
--------------------------------------

Already so many other mapreduce issues have only modified cluster-setup.xml, 
the one in mapreduce project. Rahul mentioned offline that forrest 
documentation is not getting generated in mapreduce sub-project. Assuming we'll 
address that in a separate issue, I propose we have only one patch - the mapred 
one.

 - The mapred Patch has git prefixes which need to be removed

 - Monitoring/Scheduling based on RAM is completely removed. So remove the 
references too. Just add a note saying that (quoting from HADOOP-5881) there 
isn't any need for distinguishing vmem from physical memory w.r.t 
configuration. Depending on a site's requirements, the configuration items can 
reflect whether one wants tasks to go beyond physical memory or not.

cluster_setup.html
 - All config names should be renamed to the new names. Of-course this means a 
slightly different patch for 0.20 - which we will come to after the patch
 for trunk's done
 - mapred.{map|reduce}.child.ulimit also need to be renamed
 - What happens when monitoring is enabled, but job has -1?
 - Memory-monitoring is no longer defined in terms of per-task-limit and 
per-node-limit. It is now driven by per-slot-size and number of slots. We 
should use these new terms through-out.
 - "Before getting into details, consider the following additional 
memory-related parameters than can be configured to enable better scheduling:"\
 The above line is no longer needed.
 
capacity_scheduler.html
 - Feature for monitoring RAM no more. Remove all references.
 - Working of scheduling
   -- Point 1: 4 parameters, not three. Parameters described in cluster_setup.  
vmem.reserved no more used.
   -- Point 2: This is changed completely. No more offsets.
   Total = numSlots * PerSlotMemSize.
   Used = Sigma(numSlotsPerTask * PerSlotMemSize)
   -- Point 3: JT now rejects the jobs, not the scheduler.
 - "See the MapReduce Tutorial for details on how the TT monitors memory usage."
 "See cluster_setup" instead?

 - Need to update mapred_tutorial.html's memory management section. Aslo need a 
reference to this in both cluster_setup.html as well as capacity_scheduler.html

 - Another point I've already mentioned on the JIRA.
 "Along with everything else, we should document that job setup and job cleanup 
tasks of all jobs, either requiring or not requiring high memory for their maps 
and reduces, still run on a single slot.

> Document changes to the memory management and scheduling model
> --------------------------------------------------------------
>
>                 Key: MAPREDUCE-1018
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1018
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: documentation
>    Affects Versions: 0.21.0
>            Reporter: Hemanth Yamijala
>            Priority: Blocker
>             Fix For: 0.21.0
>
>         Attachments: MAPRED-1018-1.patch, MAPRED-1018-2.patch, 
> MAPRED-1018-commons.patch
>
>
> There were changes done for the configuration, monitoring and scheduling of 
> high ram jobs. This must be documented in the mapred-defaults.xml and also on 
> forrest documentation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1018) Document changes to the memory management and scheduling model

Reply via email to