[ https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539171 ]
Milind Bhandarkar commented on HADOOP-1917: ------------------------------------------- Comments on HADOOP-1917 Overview.html: "Hadoop was been" -> "Hadoop has been" "Optionally install rsync must be installed" _> "Optionally install rsync" "build it with ant" -> *whats the ant target ?* what's the default for HADOOP_LOG_DIR ? "$ bin/hadoop dfs -put input input" -> "$ bin/hadoop dfs -put conf input" should there be a step to examine web-ui for JT and NN ? setup.html: HADOOP_HEAPSIZE -> *need some typical values here ?* "where the NameNode stores the name table" -> "where the NameNode stores the namespace and transactions logs persistently" "server and client machines." -> *need to document early that NameNode and JobTracker are server machines, and "DataNode+TaskTracker" are client machines* "slave processors" -> *please use consistent terminology, prefer "worker" to "slave"* *argh.. "slaves" name is hardcoded as a file name conf/slaves in hadoop. I should probably file a jira* Also, mapred.map.tasks and mapred.reduce.tasks should *not* be marked final in typical cases. mapred_tutorial.html: *consider removing google mapreduce paper link as prerequisite, since the goal of the tutorial is to provide all the information needed to understand map-reduce* *A picture would help in the overview.* *In the Input and Output section, remove the use of combiner.* *In the wordcount example, simplify it even more by avoiding the use of ToolRunner* "submission amp;" -> "submission and" "de-initialization" -> "finalization? clean-up?" *wherever overriding is mentioned, also metion the default value. e.g. partitioner, inputformat, inputsplit etc.* *please provide a javadoc link to DistributedCache at the first mention* Overall comments: This is extremely useful. However, the level of detail is overwhelming for a Mapreduce tutorial. Maybe split this into two ? basic and Advanced. Basic should be enough to understand WordCount, and Advanced should then go into all the details ? > Need configuration guides for Hadoop > ------------------------------------ > > Key: HADOOP-1917 > URL: https://issues.apache.org/jira/browse/HADOOP-1917 > Project: Hadoop > Issue Type: Improvement > Components: conf > Affects Versions: 0.14.1 > Reporter: Sameer Paranjpye > Assignee: Arun C Murthy > Priority: Critical > Fix For: 0.16.0 > > Attachments: HADOOP-1917_1_20071025.patch, > HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch > > > We've recently had a spate of questions on the users list regarding features > such as rack-awareness, the trash can etc. which are not clearly documented > from a user/admins perspective. There is some Javadoc present but most of the > "documentation" exists either in JIRA or in the default config files > themselves. > We should generate top down configuration and use guides for map/reduce and > HDFS. These should probably be in forest and accessible from the project > website (Javadoc isn't always approachable to our non-programmer audience). > Committers should look for user documentation before accepting patches. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.