[jira] Commented: (HADOOP-1917) Need configuration guides for Hadoop

Nigel Daley (JIRA) Sun, 04 Nov 2007 21:55:11 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540134
 ]


Nigel Daley commented on HADOOP-1917:
-------------------------------------

Ok, final set of comments on the tutorial:

Application typically implement -> 
Applications typically implement

These represent the core -> 
These form the core

<code>Mapper</code> implementations can access the <code>JobConf</code> ... -> 
<code>Mapper</code> implementations are passed the <code>JobConf</code> via the 
... (discuss the ordering guarantees of the calls made to the Mapper methods: 
configure, map, close)

"de-initialization" -> "finalization" or "tear down" or "cleanup"

(the above 2 comments also apply to the Reducer section)

"The framework then calls" makes it sound like you were previously talking 
about the sequencing of calls (which I don't think you were)

"to report progress, status, counters and so on, or just indicate that they are 
alive" -> "to report progress, status, and counters" (it looks like that's all 
you can do with the Reporter interface)

(the above comment also apply to the Reducer section)

"The grouped <code>Mapper</code> outputs are partitioned per 
<code>Reducer</code>" (I think this concept needs more explanation as it's not 
obvious to the new user)

which is only a hint -> 
which only provides a hint

conjunction to simulate -> 
conjunction to simulate a

If equivalence rules for keys while grouping the intermediates are different 
from those for grouping keys before reduction ->
If equivalence rules for grouping the intermediates keys are required to be 
different from those for grouping keys before reduction

<em>not re-sorted</em> -> 
<em>not sorted</em> by the framework

<code>zero</code> -> <em>zero</em>

is sent for reduction -> is sent to for reduction

possibly link to HashPartitioner javadoc

insignificant amount of time -> significant amount of time

even to <code>zero</code> -> even to <em>zero</em>
(as written, it looks like the user should do this:
mapred.task.timeout=zero
which is clearly wrong)

job-configuration -> job configuration

Should the job conf section describe how job configs can be set? ie command 
line, programatically, config files, etc.???

record-oriented view for the -> 
record-oriented view to the

write out the output files ->
write the output files

Tasks' Side-Effect Files ->
Task Side-Effect Files

Some applications need ->
In some applications the tasks need

To avoid thes issues ->
To avoid these issues

completion of the task-attempt ->
completion of the task-attempt,

Applications specify the files, via urls (hdfs:// or http://) to be cached via 
the <code>JobConf</code> ->
Applications specify the files to be cached via urls (hdfs:// or http://) 
configured in the <code>JobConf</code>

are only copied once per job and the ability to cache archives which are 
un-archived on the slaves ->
are copied (and un-archived if necessary) only once per job on each slave


> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch, 
> HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch, 
> HADOOP-1917_4_20071105.patch
>
>
> We've recently had a spate of questions on the users list regarding features 
> such as rack-awareness, the trash can etc. which are not clearly documented 
> from a user/admins perspective. There is some Javadoc present but most of the 
> "documentation" exists either in JIRA or in the default config files 
> themselves.
> We should generate top down configuration and use guides for map/reduce and 
> HDFS. These should probably be in forest and accessible from the project 
> website (Javadoc isn't always approachable to our non-programmer audience). 
> Committers should look for user documentation before accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1917) Need configuration guides for Hadoop

Reply via email to