[
https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539171
]
Milind Bhandarkar commented on HADOOP-1917:
-------------------------------------------
Comments on HADOOP-1917
Overview.html:
"Hadoop was been" -> "Hadoop has been"
"Optionally install rsync must be installed" _> "Optionally install rsync"
"build it with ant" -> *whats the ant target ?*
what's the default for HADOOP_LOG_DIR ?
"$ bin/hadoop dfs -put input input" -> "$ bin/hadoop dfs -put conf input"
should there be a step to examine web-ui for JT and NN ?
setup.html:
HADOOP_HEAPSIZE -> *need some typical values here ?*
"where the NameNode stores the name table" -> "where the NameNode stores the
namespace and transactions logs persistently"
"server and client machines." -> *need to document early that NameNode and
JobTracker are server machines, and "DataNode+TaskTracker" are client machines*
"slave processors" -> *please use consistent terminology, prefer "worker" to
"slave"*
*argh.. "slaves" name is hardcoded as a file name conf/slaves in hadoop. I
should probably file a jira*
Also, mapred.map.tasks and mapred.reduce.tasks should *not* be marked final in
typical cases.
mapred_tutorial.html:
*consider removing google mapreduce paper link as prerequisite, since the goal
of the tutorial is to provide all the information needed to understand
map-reduce*
*A picture would help in the overview.*
*In the Input and Output section, remove the use of combiner.*
*In the wordcount example, simplify it even more by avoiding the use of
ToolRunner*
"submission amp;" -> "submission and"
"de-initialization" -> "finalization? clean-up?"
*wherever overriding is mentioned, also metion the default value. e.g.
partitioner, inputformat, inputsplit etc.*
*please provide a javadoc link to DistributedCache at the first mention*
Overall comments: This is extremely useful. However, the level of detail is
overwhelming for a Mapreduce tutorial. Maybe split this into two ? basic and
Advanced. Basic should be enough to understand WordCount, and Advanced should
then go into all the details ?
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1917_1_20071025.patch,
> HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch
>
>
> We've recently had a spate of questions on the users list regarding features
> such as rack-awareness, the trash can etc. which are not clearly documented
> from a user/admins perspective. There is some Javadoc present but most of the
> "documentation" exists either in JIRA or in the default config files
> themselves.
> We should generate top down configuration and use guides for map/reduce and
> HDFS. These should probably be in forest and accessible from the project
> website (Javadoc isn't always approachable to our non-programmer audience).
> Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.