[
https://issues.apache.org/jira/browse/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540134
]
Nigel Daley commented on HADOOP-1917:
-------------------------------------
Ok, final set of comments on the tutorial:
Application typically implement ->
Applications typically implement
These represent the core ->
These form the core
<code>Mapper</code> implementations can access the <code>JobConf</code> ... ->
<code>Mapper</code> implementations are passed the <code>JobConf</code> via the
... (discuss the ordering guarantees of the calls made to the Mapper methods:
configure, map, close)
"de-initialization" -> "finalization" or "tear down" or "cleanup"
(the above 2 comments also apply to the Reducer section)
"The framework then calls" makes it sound like you were previously talking
about the sequencing of calls (which I don't think you were)
"to report progress, status, counters and so on, or just indicate that they are
alive" -> "to report progress, status, and counters" (it looks like that's all
you can do with the Reporter interface)
(the above comment also apply to the Reducer section)
"The grouped <code>Mapper</code> outputs are partitioned per
<code>Reducer</code>" (I think this concept needs more explanation as it's not
obvious to the new user)
which is only a hint ->
which only provides a hint
conjunction to simulate ->
conjunction to simulate a
If equivalence rules for keys while grouping the intermediates are different
from those for grouping keys before reduction ->
If equivalence rules for grouping the intermediates keys are required to be
different from those for grouping keys before reduction
<em>not re-sorted</em> ->
<em>not sorted</em> by the framework
<code>zero</code> -> <em>zero</em>
is sent for reduction -> is sent to for reduction
possibly link to HashPartitioner javadoc
insignificant amount of time -> significant amount of time
even to <code>zero</code> -> even to <em>zero</em>
(as written, it looks like the user should do this:
mapred.task.timeout=zero
which is clearly wrong)
job-configuration -> job configuration
Should the job conf section describe how job configs can be set? ie command
line, programatically, config files, etc.???
record-oriented view for the ->
record-oriented view to the
write out the output files ->
write the output files
Tasks' Side-Effect Files ->
Task Side-Effect Files
Some applications need ->
In some applications the tasks need
To avoid thes issues ->
To avoid these issues
completion of the task-attempt ->
completion of the task-attempt,
Applications specify the files, via urls (hdfs:// or http://) to be cached via
the <code>JobConf</code> ->
Applications specify the files to be cached via urls (hdfs:// or http://)
configured in the <code>JobConf</code>
are only copied once per job and the ability to cache archives which are
un-archived on the slaves ->
are copied (and un-archived if necessary) only once per job on each slave
> Need configuration guides for Hadoop
> ------------------------------------
>
> Key: HADOOP-1917
> URL: https://issues.apache.org/jira/browse/HADOOP-1917
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.14.1
> Reporter: Sameer Paranjpye
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
> Attachments: HADOOP-1917_1_20071025.patch,
> HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch,
> HADOOP-1917_4_20071105.patch
>
>
> We've recently had a spate of questions on the users list regarding features
> such as rack-awareness, the trash can etc. which are not clearly documented
> from a user/admins perspective. There is some Javadoc present but most of the
> "documentation" exists either in JIRA or in the default config files
> themselves.
> We should generate top down configuration and use guides for map/reduce and
> HDFS. These should probably be in forest and accessible from the project
> website (Javadoc isn't always approachable to our non-programmer audience).
> Committers should look for user documentation before accepting patches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.