[ 
https://issues.apache.org/jira/browse/HADOOP-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated HADOOP-2046:
----------------------------------

    Attachment: HADOOP-2046_4_20071025.patch

Updated patch.

I'll file another jira for some more documentation via forrest which allows 
these to go into 0.15.0. The forrest is almost done too but it doesn't have to 
block 0.15.0 since the hadoop website is the trunk and can be updated as soon 
as that patch goes in.

> Documentation: Hadoop Install/Configuration Guide and Map-Reduce User Manual
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2046
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2046
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: documentation
>    Affects Versions: 0.14.2
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: HADOOP-2046_1_20071018.patch, 
> HADOOP-2046_2_20071022.patch, HADOOP-2046_3_20071023.patch, 
> HADOOP-2046_4_20071025.patch
>
>
> I'd like to put forward some thoughts on how to structure reasonably detailed 
> documentation for hadoop.
> Essentially I think of atleast 3 different profiles to target:
> * hadoop-dev, folks who are actively involved improving/fixing hadoop.
> * hadoop-user
> ** mapred application writers and/or folks who directly use hdfs
> ** hadoop cluster administrators
> For this issue, I'd like to first target the latter category (admin and 
> hdfs/mapred user) - where, arguably, is the biggest bang for the buck, right 
> now. 
> There is a crying need to get user-level stuff documented, judging by the 
> sheer no. of emails we get on the hadoop lists...
> ----
> *1. Installing/Configuration Guides*
> This set of documents caters to folks ranging from someone just playing with 
> hadoop on a single-node to operations teams who administer hadoop on several 
> nodes (thousands). To ensure we cover all bases I'm thinking along the lines 
> of:
> * _Download, install and configure hadoop_ on a single-node cluster: 
> including a few comments on how to run examples (word-count) etc.
> * *Admin Guide*: Install and configure a real, distributed cluster. 
> * *Tune Hadoop*: Separate sections on how to tune hdfs and map-reduce, 
> targeting power admins/users.
> I reckon most of this would be done via forrest, with appropriate links to 
> javadoc.
> ---
> *2. User Manual*
> This set is geared for people who use hdfs and/or map-reduce per-se. Stuff to 
> document:
> * Write a really simple mapred application, just fitting the blocks together 
> i.e. maybe a walk-through of a couple of examples like word-count, sort etc.
> * Detailed information on important map-reduce user-interfaces:
> *- JobConf
> *- JobClient
> *- Tool & ToolRunner
> *- InputFormat 
> *-- InputSplit
> *-- RecordReader
> *- Mapper
> *- Reducer
> *- Reporter
> *- OutputCollector
> *- Writable
> *- WritableComparable
> *- OutputFormat
> *- DistributedCache
> * SequenceFile
> *- Compression types: NONE, RECORD, BLOCK
> * Hadoop Streaming
> * Hadoop Pipes
> I reckon most of this would land up in the javadocs, specifically 
> package.html and some via forrest.
> ----
> Also, as discussed in HADOOP-1881, it would be quite useful to maintain 
> documentation per-release, even on the hadoop website i.e. we could have a 
> main documentation page link to documentation per-release and to the trunk.
> ----
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to