Author: buildbot
Date: Tue Jun 3 15:43:41 2014
New Revision: 911105
Log:
Staging update by buildbot for crunch
Modified:
websites/staging/crunch/trunk/content/ (props changed)
websites/staging/crunch/trunk/content/user-guide.html
Propchange: websites/staging/crunch/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Tue Jun 3 15:43:41 2014
@@ -1 +1 @@
-1589941
+1599620
Modified: websites/staging/crunch/trunk/content/user-guide.html
==============================================================================
--- websites/staging/crunch/trunk/content/user-guide.html (original)
+++ websites/staging/crunch/trunk/content/user-guide.html Tue Jun 3 15:43:41
2014
@@ -463,8 +463,8 @@ framework won't kill it,</li>
<li><code>setStatus(String status)</code> and <code>getStatus</code> for
setting and retrieving task status information, and</li>
<li><code>getTaskAttemptID()</code> for accessing the current
<code>TaskAttemptID</code> information.</li>
</ul>
-<p>DoFns also have a number of helper methods for working with <a
href="http://codingwiththomas.blogspot.com/2011/04/controlling-hadoop-job-recursion.html">Hadoop
Counters</a>, all named <code>increment</code>. Counters are an incredibly
useful way of keeping track of the state of long running data pipelines and
detecting any exceptional conditions that
-occur during processing, and they are supported in both the MapReduce-based
and in-memory Crunch pipeline contexts. You can retrive the value of the
Counters
+<p>DoFns also have a number of helper methods for working with <a
href="http://codingwiththomas.blogspot.com/2011/04/controlling-hadoop-job-recursion.html">Hadoop
Counters</a>, all named <code>increment</code>. Counters are an incredibly
useful way of keeping track of the state of long-running data pipelines and
detecting any exceptional conditions that
+occur during processing, and they are supported in both the MapReduce-based
and in-memory Crunch pipeline contexts. You can retrieve the value of the
Counters
in your client code at the end of a MapReduce pipeline by getting them from
the <a
href="apidocs/0.9.0/org/apache/crunch/PipelineResult.StageResult.html">StageResult</a>
objects returned by Crunch at the end of a run.</p>
<ul>
@@ -474,7 +474,7 @@ objects returned by Crunch at the end of
<li><code>increment(Enum<?> counterName, long value)</code> increments
the value of the given counter by the given value.</li>
</ul>
<p>(Note that there was a change in the Counters API from Hadoop 1.0 to Hadoop
2.0, and thus we do not recommend that you work with the
-Counter classes directly in yoru Crunch pipelines (the two
<code>getCounter</code> methods that were defined in DoFn are both deprecated)
so that you will not be
+Counter classes directly in your Crunch pipelines (the two
<code>getCounter</code> methods that were defined in DoFn are both deprecated)
so that you will not be
required to recompile your job jars when you move from a Hadoop 1.0 cluster to
a Hadoop 2.0 cluster.)</p>
<p><a name="doplan"></a></p>
<h3
id="configuring-the-crunch-planner-and-mapreduce-jobs-with-dofns">Configuring
the Crunch Planner and MapReduce Jobs with DoFns</h3>
@@ -562,7 +562,7 @@ PTypes that can be constructed out of ot
call on a PCollection will be a PTable instead of a PCollection, and only the
PTable interface has the groupByKey method that
can be used to kick off a shuffle on the cluster.</p>
<pre>
- public static class InidicatorFn<T> extends MapFn<T, Pair<T,
Boolean>> {
+ public static class IndicatorFn<T> extends MapFn<T, Pair<T,
Boolean>> {
public Pair<T, Boolean> map(T input) { ... }
}
@@ -1129,7 +1129,7 @@ in the Apache Pig book.</p>
Crunch APIs have a number of utilities for performing fully distributed sorts
as well as
more advanced patterns like secondary sorts.</p>
<p><a name="stdsort"></a></p>
-<h4 id="standard-and-reveserse-sorting">Standard and Reveserse Sorting</h4>
+<h4 id="standard-and-reverse-sorting">Standard and Reverse Sorting</h4>
<p>The <a href="apidocs/0.9.0/org/apache/crunch/lib/Sort.html">Sort</a> API
methods contain utility functions
for sorting the contents of PCollections and PTables whose contents implement
the <code>Comparable</code>
interface. By default, MapReduce does not perform total sorts on its keys
during a shuffle; instead