[ 
https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539171
 ] 

Milind Bhandarkar commented on HADOOP-1917:
-------------------------------------------

Comments on HADOOP-1917

Overview.html:

"Hadoop was been" -> "Hadoop has been"
"Optionally install rsync must be installed" _> "Optionally install rsync"
"build it with ant" -> *whats the ant target ?*
what's the default for HADOOP_LOG_DIR ?
"$ bin/hadoop dfs -put input input" -> "$ bin/hadoop dfs -put conf input"

should there be a step to examine web-ui for JT and NN ?


setup.html:

HADOOP_HEAPSIZE -> *need some typical values here ?*
"where the NameNode stores the name table" -> "where the NameNode stores the 
namespace and transactions logs persistently"
"server and client machines." -> *need to document early that NameNode and 
JobTracker are server machines, and "DataNode+TaskTracker" are client machines*
"slave processors" -> *please use consistent terminology, prefer "worker" to 
"slave"*
*argh.. "slaves" name is hardcoded as a file name conf/slaves in hadoop. I 
should probably file a jira*

Also, mapred.map.tasks and mapred.reduce.tasks should *not* be marked final in 
typical cases.

mapred_tutorial.html:

*consider removing google mapreduce paper link as prerequisite, since the goal 
of the tutorial is to provide all the information needed to understand 
map-reduce*
*A picture would help in the overview.*
*In the Input and Output section, remove the use of combiner.*
*In the wordcount example, simplify it even more by avoiding the use of 
ToolRunner*
"submission amp;" -> "submission and"
"de-initialization" -> "finalization? clean-up?"
*wherever overriding is mentioned, also metion the default value. e.g. 
partitioner, inputformat, inputsplit etc.*
*please provide a javadoc link to DistributedCache at the first mention*


Overall comments: This is extremely useful. However, the level of detail is 
overwhelming for a Mapreduce tutorial. Maybe split this into two ? basic and 
Advanced. Basic should be enough to understand WordCount, and Advanced should 
then go into all the details ?

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch, 
> HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch
>
>
> We've recently had a spate of questions on the users list regarding features 
> such as rack-awareness, the trash can etc. which are not clearly documented 
> from a user/admins perspective. There is some Javadoc present but most of the 
> "documentation" exists either in JIRA or in the default config files 
> themselves.
> We should generate top down configuration and use guides for map/reduce and 
> HDFS. These should probably be in forest and accessible from the project 
> website (Javadoc isn't always approachable to our non-programmer audience). 
> Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to