Author: buildbot
Date: Mon Nov 25 05:21:07 2013
New Revision: 887976
Log:
Staging update by buildbot for crunch
Modified:
websites/staging/crunch/trunk/content/ (props changed)
websites/staging/crunch/trunk/content/intro.html
Propchange: websites/staging/crunch/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Mon Nov 25 05:21:07 2013
@@ -1 +1 @@
-1544354
+1545153
Modified: websites/staging/crunch/trunk/content/intro.html
==============================================================================
--- websites/staging/crunch/trunk/content/intro.html (original)
+++ websites/staging/crunch/trunk/content/intro.html Mon Nov 25 05:21:07 2013
@@ -250,7 +250,7 @@ return a type that has an associated obj
supports two serialization frameworks, called <em>type families</em>: one
based on Hadoop's <code>Writable</code> interface, and another based on
<code>Apache Avro</code>.
You can read more about how to work with Crunch's serialization libraries
here. TODO</p>
<p>Because all of the core logic in our application is exposed via a single
static method that operates on Crunch interfaces, we can use Crunch's
-in-memory API to test our business logic using a unit testing framework like
JUnit. Let's look at an exampel unit test for the word count
+in-memory API to test our business logic using a unit testing framework like
JUnit. Let's look at an example unit test for the word count
application:</p>
<div class="codehilite"><pre><span class="n">package</span> <span
class="n">org</span><span class="p">.</span><span class="n">myorg</span><span
class="p">;</span>
@@ -283,51 +283,55 @@ Collections Classes like <code>java.util
pipeline into the client and make decisions based on that data allows us to
create sophisticated analytical
applications that can modify their downstream processing based on the results
of upstream computations.</p>
<h3 id="data-model-and-operators">Data Model and Operators</h3>
-<p>The Java API is centered around three interfaces that represent distributed
datasets: <code>PCollection<T></code>, <code>PTable<K, V></code>,
and <code>PGroupedTable<K, V></code>.</p>
+<p>The Java API is centered around three interfaces that represent distributed
datasets: <a
href="apidocs/current/org/apache/crunch/PCollection.html">PCollection<T></a>,
+<a
href="http://crunch.apache.org/apidocs/current/org/apache/crunch/PTable.html">PTable<K,
V></a>, and <a
href="apidocs/current/org/apache/crunch/PGroupedTable.html">PGroupedTable<K,
V></a>.</p>
<p>A <code>PCollection<T></code> represents a distributed, unordered
collection of elements of type T. For example, we represent a text file as a
-<code>PCollection<String></code> object. PCollection provides a method,
<code>parallelDo</code>, that applies a <code>DoFn</code> to each element in a
PCollection in parallel,
-and returns a new PCollection as its result. </p>
-<p>A <code>PTable<K, V></code> is a sub-interface of PCollection that
represents a distributed, unordered multimap of its key type K to its value
type V.
+<code>PCollection<String></code> object.
<code>PCollection<T></code> provides a method, <code>parallelDo</code>,
that applies a <a href="apidocs/current/org/apache/crunch/DoFn.html">DoFn<T,
U></a>
+to each element in the <code>PCollection<T></code> in parallel, and
returns an new <code>PCollection<U></code> as its result.</p>
+<p>A <code>PTable<K, V></code> is a sub-interface of
<code>PCollection<Pair<K, V>></code> that represents a distributed,
unordered multimap of its key type K to its value type V.
In addition to the parallelDo operation, PTable provides a
<code>groupByKey</code> operation that aggregates all of the values in the
PTable that
-have the same key into a single record. It is the groupByKey operation that
triggers the sort phase of a MapReduce job.</p>
-<p>The result of a groupByKey operation is a <code>PGroupedTable<K,
V></code> object, which is a distributed, sorted map of keys of type K to an
Iterable
-collection of values of type V. In addition to parallelDo, the PGroupedTable
provides a <code>combineValues</code> operation, which allows for
-a commutative and associative aggregation operator to be applied to the values
of the PGroupedTable instance on both the map side and the
-reduce side of a MapReduce job.</p>
+have the same key into a single record. It is the groupByKey operation that
triggers the sort phase of a MapReduce job. Developers can exercise
+fine-grained control over the number of reducers and the partitioning,
grouping, and sorting strategies used during the shuffle by providing an
instance
+of the <a
href="apidocs/current/org/apache/crunch/GroupingOptions.html">GroupingOptions</a>
class to the <code>groupByKey</code> function.</p>
+<p>The result of a groupByKey operation is a <code>PGroupedTable<K,
V></code> object, which is a distributed, sorted map of keys of type K to an
Iterable<V> that may
+be iterated over exactly once. In addition to <code>parallelDo</code>
processing via DoFns, PGroupedTable provides a <code>combineValues</code>
operation that allows a
+commutative and associative <a
href="apidocs/current/org/apache/crunch/Aggregator.html">Aggregator<V></a> to
be applied to the values of the PGroupedTable
+instance on both the map and reduce sides of the shuffle. A number of common
<code>Aggregator<V></code> implementations are provided in the
+<a
href="apidocs/current/org/apache/crunch/fn/Aggregators.html">Aggregators</a>
class.</p>
<p>Finally, PCollection, PTable, and PGroupedTable all support a
<code>union</code> operation, which takes a series of distinct PCollections
that all have
the same data type and treats them as a single virtual PCollection.</p>
-<p>All of the other MapReduce patterns supported by the Crunch APIs
(aggregations, joins, sorts, secondary sorts, and cogrouping) are all
implemented
-in terms of these four primitives. The patterns themselves are defined in the
<code>org.apache.crunch.lib</code> package and its children, and a few of
-the most common patterns have convenience functions defined on the PCollection
and PTable interfaces. We will do a more detailed review of these
-patterns later in this document, but here are a few examples to get you
started: TODO</p>
+<p>All of the other data transformation operations supported by the Crunch
APIs (aggregations, joins, sorts, secondary sorts, and cogrouping) are
implemented
+in terms of these four primitives. The patterns themselves are defined in the
<a
href="apidocs/current/org/apache/crunch/lib/package-summary.html">org.apache.crunch.lib</a>
+package and its children, and a few of of the most common patterns have
convenience functions defined on the PCollection and PTable interfaces.</p>
<h3 id="writing-dofns">Writing DoFns</h3>
<p>DoFns represent the logical computations of your Crunch pipelines. They are
designed to be easy to write, easy to test, and easy to deploy
within the context of a MapReduce job. Much of your work with the Crunch APIs
will be writing DoFns, and so having a good understanding of
how to use them effectively is critical to crafting elegant and efficient
pipelines.</p>
<h4 id="dofn-extends-serializable">DoFn extends Serializable</h4>
<p>The most important thing to remember about DoFns is that they all implement
the <code>java.io.Serializable</code> interface, which means that all of the
-state information associated with a DoFn must also be serializable. There is
an excellent overview of Java serializability here that is worth
-reviewing if you aren't familiar with Java's serializability model. TODO</p>
-<p>If your DoFn needs to work with a class that does not implement
Serializable and cannot be modified (e.g., because it is defined in a
third-party
+state information associated with a DoFn must also be serializable. There is
an <a
href="http://docs.oracle.com/javase/tutorial/jndi/objects/serial.html">excellent
overview of Java serializability</a> that is worth reviewing if you aren't
familiar with it already.</p>
+<p>If your DoFn needs to work with a class that does not implement
Serializable and cannot be modified (for example, because it is defined in a
third-party
library), you should use the <code>transient</code> keyword on that member
variable so that serializing the DoFn won't fail if that object happens to be
defined. You can create an instance of the object during runtime using the
<code>initialize</code> method described in the following section.</p>
<h4 id="runtime-processing-steps">Runtime Processing Steps</h4>
<p>After the Crunch runtime loads the serialized DoFns into its map and reduce
tasks, the DoFns are executed on the input data via the following
sequence:</p>
-<h1
id="first-the-dofn-is-given-access-to-the-taskinputoutputcontext-implementation-for-the-current-task-this-allows-the-dofn-to-access-any">First,
the DoFn is given access to the <code>TaskInputOutputContext</code>
implementation for the current task. This allows the DoFn to access any</h1>
-<p>necessary configuration and runtime information needed before or during
processing.</p>
-<h1
id="next-the-dofns-initialize-method-is-called-the-initialize-method-is-similar-to-the-setup-method-used-in-the-mapper-and-reducer-classes">Next,
the DoFn's <code>initialize</code> method is called. The initialize method is
similar to the <code>setup</code> method used in the Mapper and Reducer
classes;</h1>
-<p>it is called before processing begins in order to enable any necessary
initialization or configuration of the DoFn to be performed. For example,
-if we were making use of a non-serializable third-party library, we would
create an instance of it here.</p>
-<h1
id="at-this-point-data-processing-begins-the-map-or-reduce-task-will-begin-passing-records-in-to-the-dofns-process-method-and-capturing-the">At
this point, data processing begins. The map or reduce task will begin passing
records in to the DoFn's <code>process</code> method, and capturing the</h1>
-<p>output of the process method into an <code>Emitter<T></code> that can
either pass the data along to another DoFn for processing or serialize it as
the output
-of the current processing stage.</p>
-<h1
id="finally-after-all-of-the-records-have-been-processed-the-void-cleanupemittert-emitter-method-is-called-on-each-dofn-the-cleanup-method">Finally,
after all of the records have been processed, the <code>void
cleanup(Emitter<T> emitter)</code> method is called on each DoFn. The
cleanup method</h1>
-<p>has a dual purpose: it can be used to emit any state information that the
DoFn wants to pass along to the next stage (for example, cleanup could
+<ol>
+<li>First, the DoFn is given access to the <code>TaskInputOutputContext</code>
implementation for the current task. This allows the DoFn to access any
+necessary configuration and runtime information needed before or during
processing.</li>
+<li>Next, the DoFn's <code>initialize</code> method is called. The initialize
method is similar to the <code>setup</code> method used in the Mapper and
Reducer classes;
+it is called before processing begins in order to enable any necessary
initialization or configuration of the DoFn to be performed. For example,
+if we were making use of a non-serializable third-party library, we would
create an instance of it here.</li>
+<li>At this point, data processing begins. The map or reduce task will begin
passing records in to the DoFn's <code>process</code> method, and capturing the
+output of the process method into an <code>Emitter<T></code> that can
either pass the data along to another DoFn for processing or serialize it as
the output
+of the current processing stage.</li>
+<li>Finally, after all of the records have been processed, the <code>void
cleanup(Emitter<T> emitter)</code> method is called on each DoFn. The
cleanup method
+has a dual purpose: it can be used to emit any state information that the DoFn
wants to pass along to the next stage (for example, cleanup could
be used to emit the sum of a list of numbers that was passed in to the DoFn's
process method), as well as to release any resources or perform any
-other cleanup task that is appropriate once the job has finished executing.</p>
+other cleanup task that is appropriate once the job has finished
executing.</li>
+</ol>
<h4 id="accessing-runtime-mapreduce-apis">Accessing Runtime MapReduce APIs</h4>
-<p>DoFns provide direct access to the <code>TaskInputOutputContext</code>
object that is used within a given Map or Reduce task via the protected
<code>getContext</code>
+<p>DoFns provide direct access to the <code>TaskInputOutputContext</code>
object that is used within a given Map or Reduce task via the
<code>getContext</code>
method. There are also a number of helper methods for working with the objects
associated with the TaskInputOutputContext, including:</p>
<ul>
<li><code>getConfiguration()</code> for accessing the
<code>Configuration</code> object that contains much of the detail about system
and user-specific parameters for a
@@ -337,57 +341,86 @@ framework won't kill it,</li>
<li><code>setStatus(String status)</code> and <code>getStatus</code> for
setting task status information, and</li>
<li><code>getTaskAttemptID()</code> for accessing the current
<code>TaskAttemptID</code> information.</li>
</ul>
-<p>Crunch provides a number of helper methods, all named
<code>increment</code> and having various signatures, for working with Hadoop
Counters.
-There was a change in the Counters API from Hadoop 1.0 to Hadoop 2.0, and thus
we do not recommend that you work with the <code>Counter</code> classes
-directly in your Crunch pipelines (the two <code>getCounter</code> methods
that were defined in DoFn are both deprecated) so that you will not be
-required to recompile your job jars when you move from a Hadoop 1.x cluster to
a Hadoop 2.x cluster.</p>
+<p>Crunch provides a number of helper methods for working with <a
href="http://codingwiththomas.blogspot.com/2011/04/controlling-hadoop-job-recursion.html">Hadoop
Counters</a>, all named <code>increment</code>. Counters are an incredibly
useful way of keeping track of the state of long running data pipelines and
detecting any exceptional conditions that
+occur during processing, and they are supported in both the MapReduce-based
and in-memory Crunch pipeline contexts. You can retrive the value of the
Counters
+in your client code at the end of a MapReduce pipeline by getting them from
the <a
href="apidocs/current/org/apache/crunch/PipelineResult.StageResult.html">StageResult</a>
+objects returned by Crunch at the end of a run.</p>
+<p>(Note that there was a change in the Counters API from Hadoop 1.0 to Hadoop
2.0, and thus we do not recommend that you work with the
+Counter classes directly in yoru Crunch pipelines (the two
<code>getCounter</code> methods that were defined in DoFn are both deprecated)
so that you will not be
+required to recompile your job jars when you move from a Hadoop 1.0 cluster to
a Hadoop 2.0 cluster.)</p>
<h4
id="configuring-the-crunch-planner-and-mapreduce-jobs-with-dofns">Configuring
the Crunch Planner and MapReduce Jobs with DoFns</h4>
<p>Although most of the DoFn methods are focused on runtime execution, there
are a handful of methods that are used during the planning phase
before a pipeline is converted into MapReduce jobs. The first of these
functions is <code>float scaleFactor()</code>, which should return a floating
point
value greater than 0.0f. You can override the scaleFactor method in your
custom DoFns in order to provide a hint to the Crunch planner about
-how much larger (or smaller) an input data set will become after passing
through the process method. If the groupByKey method is called without
+how much larger (or smaller) an input data set will become after passing
through the process method. If the <code>groupByKey</code> method is called
without
an explicit number of reducers provided, the planner will try to guess how
many reduce tasks should be used for the job based on the size of
-the input data, which is determined in part by using the scaleFactor
results.</p>
+the input data, which is determined in part by using the result of calling the
<code>scaleFactor</code> method on the DoFns in the processing path.</p>
<p>Sometimes, you may know that one of your DoFns has some unusual parameter
settings that need to be specified on any job that includes that
DoFn as part of its processing. A DoFn can modify the Hadoop Configuration
object that is associated with the MapReduce job it is assigned to
on the client before processing begins by overriding the <code>void
configure(Configuration conf)</code> method. For example, you might know that
the DoFn
will require extra memory settings to run, and so you could make sure that the
value of the <code>mapred.child.java.opts</code> argument had a large enough
memory setting for the DoFn's needs before the job was launched on the
cluster.</p>
-<h4 id="dofn-extensions-and-helper-classes">DoFn Extensions and Helper
Classes</h4>
+<h4 id="common-dofn-patterns">Common DoFn Patterns</h4>
<p>The Crunch APIs contain a number of useful subclasses of DoFn that handle
common data processing scenarios and are easier
-to write and test. The top-level <code>org.apache.crunch</code> package
contains three of the most important specializations, which we will
-discuss now. Each of these specialized DoFn implementations has associated
methods on the PCollection, PTable, and PGroupedTable
-interfaces to support these common data processing tasks.</p>
-<p>The simplest extension is the <code>FilterFn<T></code> class, which
defines a single abstract method, <code>boolean accept(T input)</code>. The
FilterFn can be applied
-to a <code>PCollection<T></code> by calling the
<code>filter(FilterFn<T> fn)</code> method, and will return a new
<code>PCollection<T></code> that only contains the elements
-of the input PCollection for which the accept method returned true. Note that
the filter function does not include a PType argument in its
+to write and test. The top-level <a
href="apidocs/current/org/apache/crunch/package-summary.html">org.apache.crunch</a>
package contains three
+of the most important specializations, which we will discuss now. Each of
these specialized DoFn implementations has associated methods
+on the PCollection, PTable, and PGroupedTable interfaces to support common
data processing steps.</p>
+<p>The simplest extension is the <a
href="apidocs/current/org/apache/crunch/FilterFn.html">FilterFn<T></a> class,
which defines a single abstract method, <code>boolean accept(T input)</code>.
+The FilterFn can be applied to a <code>PCollection<T></code> by calling
the <code>filter(FilterFn<T> fn)</code> method, and will return a new
<code>PCollection<T></code> that only contains
+the elements of the input PCollection for which the accept method returned
true. Note that the filter function does not include a PType argument in its
signature, because there is no change in the data type of the PCollection when
the FilterFn is applied. It is possible to compose new FilterFn
-instances by combining multiple FilterFns together using the <code>and</code>,
<code>or</code>, and <code>not</code> factory methods defined in the FilterFns
helper class.</p>
-<p>The second extension is the <code>MapFn<S, T></code> class, which
defines a single abstract method, <code>T map(S input)</code>. For simple
transform tasks in which
-every input record will have exactly one output, it's easy to test a MapFn by
verifying that a given input returns a given output. MapFns are
-also used by Crunch's data serialization libraries to map between serialized
data types (such as Writables or Avro records) and POJOs.</p>
+instances by combining multiple FilterFns together using the <code>and</code>,
<code>or</code>, and <code>not</code> factory methods defined in the
+<a href="apidocs/current/org/apache/crunch/fn/FilterFns.html">FilterFns</a>
helper class.</p>
+<p>The second extension is the <a
href="apidocs/current/org/apache/crunch/MapFn.html">MapFn<S, T></a> class,
which defines a single abstract method, <code>T map(S input)</code>.
+For simple transform tasks in which every input record will have exactly one
output, it's easy to test a MapFn by verifying that a given input returns a
+every input record will have exactly one output, it's easy to test a MapFn by
verifying that a given input returns a given output.</p>
<p>MapFns are also used in specialized methods on the PCollection and PTable
interfaces. <code>PCollection<V></code> defines the method
<code>PTable<K,V> by(MapFn<V, K> mapFn, PType<K>
keyType)</code> that can be used to create a PTable from a PCollection by
writing a
function that extracts the key (of type K) from the value (of type V)
contained in the PCollection. The by function only requires that the PType of
the key be given and constructs a <code>PTableType<K, V></code> from the
given key type and the PCollection's existing value type. <code>PTable<K,
V></code>, in turn,
has methods <code>PTable<K1, V> mapKeys(MapFn<K, K1> mapFn)</code>
and <code>PTable<K, V2> mapValues(MapFn<V, V2>)</code> that handle
the common case of converting
just one of the paired values in a PTable instance from one type to another
while leaving the other type the same.</p>
-<p>The final top-level extension to DoFn is the <code>CombineFn<K,
V></code> class, which is used in conjunction with the
<code>combineValues</code> method defined on the
-PGroupedTable interface. CombineFns are used to represent the associative
operations that can be applied using the MapReduce Combiner concept in
-order to reduce the amount of data that is shipped over the network during the
shuffle. The CombineFn extension is different from the FilterFn and
-MapFn classes in that it does not define an abstract method for handling data
besides the default <code>process</code> method that any other DoFn would use;
-rather, extending the CombineFn class signals to the Crunch planner that the
logic contained in this class satisfies the conditions required for use
-with the MapReduce combiner. Crunch supports many types of these associative
patterns, such as sums, counts, and set unions, via the
<code>Aggregator<V></code> interface,
-which is defined right alongside the CombineFn class in the top-level
<code>org.apache.crunch</code> package. There are a number of implementations
of the Aggregator
-interface defined via static factory methods in the
<code>org.apache.crunch.fn.Aggregators</code> class.</p>
+<p>The final top-level extension to DoFn is the <a
href="apidocs/current/org/apache/crunch/CombineFn.html">CombineFn<K, V></a>
class, which is used in conjunction with
+the <code>combineValues</code> method defined on the PGroupedTable interface.
CombineFns are used to represent the associative operations that can be applied
using
+the MapReduce Combiner concept in order to reduce the amount data that is
shipped over the network during a shuffle.</p>
+<p>The CombineFn extension is different from the FilterFn and MapFn classes in
that it does not define an abstract method for handling data
+beyond the default <code>process</code> method that any other DoFn would use;
rather, extending the CombineFn class signals to the Crunch planner that the
logic
+contained in this class satisfies the conditions required for use with the
MapReduce combiner.</p>
+<p>Crunch supports many types of these associative patterns, such as sums,
counts, and set unions, via the <a
href="apidocs/current/org/apache/crunch/Aggregator.html">Aggregator<V></a>
+interface, which is defined right alongside the CombineFn class in the
top-level <code>org.apache.crunch</code> package. There are a number of
implementations of the Aggregator
+interface defined via static factory methods in the <a
href="apidocs/current/org/apache/crunch/fn/Aggregators.html">Aggregators</a>
class.</p>
<h3 id="serializing-data-with-ptypes">Serializing Data with PTypes</h3>
<p>Why PTypes Are Necessary, the two type families, the core methods and
tuples.</p>
<h4 id="extending-ptypes">Extending PTypes</h4>
-<h3 id="reading-data-sources">Reading Data: Sources</h3>
-<h3 id="writing-data-targets">Writing Data: Targets</h3>
+<p>The simplest way to create a new <code>PType<T></code> for a data
object is to create a <em>derived</em> PType from one of the built-in PTypes
for the Avro
+and Writable type families. If we have a base <code>PType<S></code>, we
can create a derived <code>PType<T></code> by implementing an input
<code>MapFn<S, T></code> and an
+output <code>MapFn<T, S></code> and then calling
<code>PTypeFamily.derived(Class<T>, MapFn<S, T> in, MapFn<T,
S> out, PType<S> base)</code>, which will return
+a new <code>PType<T></code>. There are examples of derived PTypes in the
<a href="apidocs/current/org/apache/crunch/types/PTypes.html">PTypes</a> class,
including
+serialization support for protocol buffers, Thrift records, Java Enums,
BigInteger, and UUIDs.</p>
+<h3 id="reading-and-writing-data-sources-targets-and-sourcetargets">Reading
and Writing Data: Sources, Targets, and SourceTargets</h3>
+<p>MapReduce developers are familiar with the <code>InputFormat<K,
V></code> and <code>OutputFormat<K, V></code> classes for reading and
writing data during
+MapReduce processing. Crunch has the analogous concepts of a
<code>Source<T></code> for reading data and a <code>Target</code> for
writing data. For data
+sources that may be treated as both the output of one pipeline phase and the
input to another, Crunch has a <code>SourceTarget<T></code> interface
+that combines the functionality of both <code>Source<T></code> and
<code>Target</code>.</p>
+<p>Sources and Targets provide several useful extensions to the functionality
provided by InputFormat and OutputFormat. First, a Source can
+encapsulate an InputFormat as well as any special Configuration settings that
are needed by that InputFormat. For example, the
+<code>AvroInputFormat</code> needs to know the Avro schema of the input Avro
file and expects to find that schema associated with the "avro.schema" key
+in the <code>Configuration</code> object for a pipeline. But if you need to
read multiple Avro files, each with its own schema, during a single MapReduce
+job, you need a way of ensuring that the different schemas for each file do
not all overwrite the "avro.schema" key in the shared
+<code>Configuration</code> object. Crunch's <code>Source<T></code>
allows you to specify a set of key-value entries that need to be set in the
<code>Configuration</code>
+before a particular input is read in a way that prevents them from conflicting
with each other, while the Target interface provides the same
+functionality for OutputFormats.</p>
+<p>The <code>Source<T></code> interface has two useful extensions. The
first is <code>TableSource<K, V></code> which extends
<code>Source<Pair<K, V>></code> and can be
+used to read in a <code>PTable<K, V></code> instance instead of a
<code>PCollection<Pair<K, V>></code> instance. The second extension
is <code>ReadableSource<T></code>, which
+declares a <code>Iterable<T> read(Configuration conf)</code> method that
allows the contents of the Source to be read directly, either into the client
+or into a DoFn implementation that can use the data read from the source to
perform additional transforms on the main input data that is
+processed using the DoFn's <code>process</code> method (this is how Crunch
supports mapside-join operations.)</p>
+<p>Support for the most common Source, Target, and SourceTarget
implementations are provided by the factory functions declared in the
+<a href="apidocs/current/org/apache/crunch/io/From.html">From</a> (Sources),
<a href="apidocs/current/org/apache/crunch/io/To.html">To</a> (Targets), and
+<a href="apidocs/current/org/apache/crunch/io/At.html">At</a> (SourceTargets)
classes in the <a
href="apidocs/current/org/apache/crunch/io/package-summary.html">org.apache.crunch.io</a>
+package.</p>
<h3 id="pipeline-building-and-execution">Pipeline Building and Execution</h3>
<h4 id="creating-a-new-crunch-pipeline">Creating A New Crunch Pipeline</h4>
-<p>Section here on Configuration of pipelines.</p>
<h4 id="managing-pipeline-execution-and-cleanup">Managing Pipeline Execution
and Cleanup</h4>
<h2 id="more-information">More Information</h2>
<p><a href="pipelines.html">Writing Your Own Pipelines</a></p>