[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540134 ]
Nigel Daley commented on HADOOP-1917: ------------------------------------- Ok, final set of comments on the tutorial: Application typically implement -> Applications typically implement These represent the core -> These form the core <code>Mapper</code> implementations can access the <code>JobConf</code> ... -> <code>Mapper</code> implementations are passed the <code>JobConf</code> via the ... (discuss the ordering guarantees of the calls made to the Mapper methods: configure, map, close) "de-initialization" -> "finalization" or "tear down" or "cleanup" (the above 2 comments also apply to the Reducer section) "The framework then calls" makes it sound like you were previously talking about the sequencing of calls (which I don't think you were) "to report progress, status, counters and so on, or just indicate that they are alive" -> "to report progress, status, and counters" (it looks like that's all you can do with the Reporter interface) (the above comment also apply to the Reducer section) "The grouped <code>Mapper</code> outputs are partitioned per <code>Reducer</code>" (I think this concept needs more explanation as it's not obvious to the new user) which is only a hint -> which only provides a hint conjunction to simulate -> conjunction to simulate a If equivalence rules for keys while grouping the intermediates are different from those for grouping keys before reduction -> If equivalence rules for grouping the intermediates keys are required to be different from those for grouping keys before reduction <em>not re-sorted</em> -> <em>not sorted</em> by the framework <code>zero</code> -> <em>zero</em> is sent for reduction -> is sent to for reduction possibly link to HashPartitioner javadoc insignificant amount of time -> significant amount of time even to <code>zero</code> -> even to <em>zero</em> (as written, it looks like the user should do this: mapred.task.timeout=zero which is clearly wrong) job-configuration -> job configuration Should the job conf section describe how job configs can be set? ie command line, programatically, config files, etc.??? record-oriented view for the -> record-oriented view to the write out the output files -> write the output files Tasks' Side-Effect Files -> Task Side-Effect Files Some applications need -> In some applications the tasks need To avoid thes issues -> To avoid these issues completion of the task-attempt -> completion of the task-attempt, Applications specify the files, via urls (hdfs:// or http://) to be cached via the <code>JobConf</code> -> Applications specify the files to be cached via urls (hdfs:// or http://) configured in the <code>JobConf</code> are only copied once per job and the ability to cache archives which are un-archived on the slaves -> are copied (and un-archived if necessary) only once per job on each slave > Need configuration guides for Hadoop > ------------------------------------ > > Key: HADOOP-1917 > URL: https://issues.apache.org/jira/browse/HADOOP-1917 > Project: Hadoop > Issue Type: Improvement > Components: conf > Affects Versions: 0.14.1 > Reporter: Sameer Paranjpye > Assignee: Arun C Murthy > Priority: Critical > Fix For: 0.16.0 > > Attachments: HADOOP-1917_1_20071025.patch, > HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, > HADOOP-1917_4_20071105.patch > > > We've recently had a spate of questions on the users list regarding features > such as rack-awareness, the trash can etc. which are not clearly documented > from a user/admins perspective. There is some Javadoc present but most of the > "documentation" exists either in JIRA or in the default config files > themselves. > We should generate top down configuration and use guides for map/reduce and > HDFS. These should probably be in forest and accessible from the project > website (Javadoc isn't always approachable to our non-programmer audience). > Committers should look for user documentation before accepting patches. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.