[
https://issues.apache.org/jira/browse/HADOOP-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535725
]
Doug Cutting commented on HADOOP-2046:
--------------------------------------
Overall this looks great. A few comments:
- In Configuration.java, the first use of 'final' should be in italics, not
bold, and the anchors in the headers should be done with <h4 id=foo>Foo</h4>.
I also find the links to String and Path mostly just introduce noise. We might
make the first reference to Path a link, but leave the rest as plain text: no
one is going to click on that link to find out what a Java String is, nor do we
need more than a single link to Path.
- In JobClient.java, the anchors should be implemented with 'id='. We should
not mention HDFS here: the system directory could be in, e.g., KFS. I would
also leave the internally used file names "job.jar" and "job.xml" out of this
description. The list of things done should include 'submission of the job to
the jobtracker'. The steps you list are all preparations for that, but we
don't want to forget that crucial step. In the list of ways to handle job
sequencing, it should be made more clear that these are alternatives: one
should choose just one method. Also, should we mention the jobcontrol stuff
here?
- in JobConf.java: the JobConf isn't XML. It can be serialized as XML, but
it's fundamentally a Map<String,String>, a Configuration. We also have anchors
that should use 'id=' here, and mentions of HDFS that should be instead just be
to FileSystem (all FileSystem's have a block size, that's used to generate
splits). And, instead of 'default InputFormat' we should say 'standard
file-based InputFormats'. We should probably also include something at the
top-level in this class about the determination of job jar file.
> Documentation: Hadoop Install/Configuration Guide and Map-Reduce User Manual
> ----------------------------------------------------------------------------
>
> Key: HADOOP-2046
> URL: https://issues.apache.org/jira/browse/HADOOP-2046
> Project: Hadoop
> Issue Type: Improvement
> Components: documentation
> Affects Versions: 0.14.2
> Reporter: Arun C Murthy
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.15.0
>
> Attachments: HADOOP-2046_1_20071018.patch
>
>
> I'd like to put forward some thoughts on how to structure reasonably detailed
> documentation for hadoop.
> Essentially I think of atleast 3 different profiles to target:
> * hadoop-dev, folks who are actively involved improving/fixing hadoop.
> * hadoop-user
> ** mapred application writers and/or folks who directly use hdfs
> ** hadoop cluster administrators
> For this issue, I'd like to first target the latter category (admin and
> hdfs/mapred user) - where, arguably, is the biggest bang for the buck, right
> now.
> There is a crying need to get user-level stuff documented, judging by the
> sheer no. of emails we get on the hadoop lists...
> ----
> *1. Installing/Configuration Guides*
> This set of documents caters to folks ranging from someone just playing with
> hadoop on a single-node to operations teams who administer hadoop on several
> nodes (thousands). To ensure we cover all bases I'm thinking along the lines
> of:
> * _Download, install and configure hadoop_ on a single-node cluster:
> including a few comments on how to run examples (word-count) etc.
> * *Admin Guide*: Install and configure a real, distributed cluster.
> * *Tune Hadoop*: Separate sections on how to tune hdfs and map-reduce,
> targeting power admins/users.
> I reckon most of this would be done via forrest, with appropriate links to
> javadoc.
> ---
> *2. User Manual*
> This set is geared for people who use hdfs and/or map-reduce per-se. Stuff to
> document:
> * Write a really simple mapred application, just fitting the blocks together
> i.e. maybe a walk-through of a couple of examples like word-count, sort etc.
> * Detailed information on important map-reduce user-interfaces:
> *- JobConf
> *- JobClient
> *- Tool & ToolRunner
> *- InputFormat
> *-- InputSplit
> *-- RecordReader
> *- Mapper
> *- Reducer
> *- Reporter
> *- OutputCollector
> *- Writable
> *- WritableComparable
> *- OutputFormat
> *- DistributedCache
> * SequenceFile
> *- Compression types: NONE, RECORD, BLOCK
> * Hadoop Streaming
> * Hadoop Pipes
> I reckon most of this would land up in the javadocs, specifically
> package.html and some via forrest.
> ----
> Also, as discussed in HADOOP-1881, it would be quite useful to maintain
> documentation per-release, even on the hadoop website i.e. we could have a
> main documentation page link to documentation per-release and to the trunk.
> ----
> Thoughts?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.