[ 
https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539171
 ] 

milindb edited comment on HADOOP-1917 at 10/31/07 1:10 PM:
---------------------------------------------------------------------

Comments on HADOOP-1917

Overview.html:

"Hadoop was been" -> "Hadoop has been"
"Optionally install rsync must be installed" _> "Optionally install rsync"
"build it with ant" -> whats the ant target ?
what's the default for HADOOP_LOG_DIR ?
"$ bin/hadoop dfs -put input input" -> "$ bin/hadoop dfs -put conf input"

should there be a step to examine web-ui for JT and NN ?


setup.html:

HADOOP_HEAPSIZE -> need some typical values here ?
"where the NameNode stores the name table" -> "where the NameNode stores the 
namespace and transactions logs persistently"
"server and client machines." -> need to document early that NameNode and 
JobTracker are server machines, and "DataNode+TaskTracker" are client machines
"slave processors" -> please use consistent terminology, prefer "worker" to 
"slave"
argh.. "slaves" name is hardcoded as a file name conf/slaves in hadoop. I 
should probably file a jira

Also, mapred.map.tasks and mapred.reduce.tasks should *not* be marked final in 
typical cases.

mapred_tutorial.html:

consider removing google mapreduce paper link as prerequisite, since the goal 
of the tutorial is to provide all the information needed to understand 
map-reduce
A picture would help in the overview.
In the Input and Output section, remove the use of combiner.
In the wordcount example, simplify it even more by avoiding the use of 
ToolRunner
"submission amp;" -> "submission and"
"de-initialization" -> "finalization? clean-up?"
wherever overriding is mentioned, also metion the default value. e.g. 
partitioner, inputformat, inputsplit etc.
please provide a javadoc link to DistributedCache at the first mention


Overall comments: This is extremely useful. However, the level of detail is 
overwhelming for a Mapreduce tutorial. Maybe split this into two ? basic and 
Advanced. Basic should be enough to understand WordCount, and Advanced should 
then go into all the details ?

      was (Author: milindb):
    Comments on HADOOP-1917

Overview.html:

"Hadoop was been" -> "Hadoop has been"
"Optionally install rsync must be installed" _> "Optionally install rsync"
"build it with ant" -> *whats the ant target ?*
what's the default for HADOOP_LOG_DIR ?
"$ bin/hadoop dfs -put input input" -> "$ bin/hadoop dfs -put conf input"

should there be a step to examine web-ui for JT and NN ?


setup.html:

HADOOP_HEAPSIZE -> *need some typical values here ?*
"where the NameNode stores the name table" -> "where the NameNode stores the 
namespace and transactions logs persistently"
"server and client machines." -> *need to document early that NameNode and 
JobTracker are server machines, and "DataNode+TaskTracker" are client machines*
"slave processors" -> *please use consistent terminology, prefer "worker" to 
"slave"*
*argh.. "slaves" name is hardcoded as a file name conf/slaves in hadoop. I 
should probably file a jira*

Also, mapred.map.tasks and mapred.reduce.tasks should *not* be marked final in 
typical cases.

mapred_tutorial.html:

*consider removing google mapreduce paper link as prerequisite, since the goal 
of the tutorial is to provide all the information needed to understand 
map-reduce*
*A picture would help in the overview.*
*In the Input and Output section, remove the use of combiner.*
*In the wordcount example, simplify it even more by avoiding the use of 
ToolRunner*
"submission amp;" -> "submission and"
"de-initialization" -> "finalization? clean-up?"
*wherever overriding is mentioned, also metion the default value. e.g. 
partitioner, inputformat, inputsplit etc.*
*please provide a javadoc link to DistributedCache at the first mention*


Overall comments: This is extremely useful. However, the level of detail is 
overwhelming for a Mapreduce tutorial. Maybe split this into two ? basic and 
Advanced. Basic should be enough to understand WordCount, and Advanced should 
then go into all the details ?
  
> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch, 
> HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch
>
>
> We've recently had a spate of questions on the users list regarding features 
> such as rack-awareness, the trash can etc. which are not clearly documented 
> from a user/admins perspective. There is some Javadoc present but most of the 
> "documentation" exists either in JIRA or in the default config files 
> themselves.
> We should generate top down configuration and use guides for map/reduce and 
> HDFS. These should probably be in forest and accessible from the project 
> website (Javadoc isn't always approachable to our non-programmer audience). 
> Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to