current intro.mdtext

jwills Sun, 24 Nov 2013 21:35:38 -0800

Author: jwills
Date: Mon Nov 25 05:34:28 2013
New Revision: 1545156

URL: http://svn.apache.org/r1545156
Log:
Intro link updates


Removed:
    crunch/site/trunk/content/apidocs/current
Modified:
    crunch/site/trunk/content/intro.mdtext

Modified: crunch/site/trunk/content/intro.mdtext
URL: 
http://svn.apache.org/viewvc/crunch/site/trunk/content/intro.mdtext?rev=1545156&r1=1545155&r2=1545156&view=diff
==============================================================================
--- crunch/site/trunk/content/intro.mdtext (original)
+++ crunch/site/trunk/content/intro.mdtext Mon Nov 25 05:34:28 2013
@@ -173,30 +173,30 @@ applications that can modify their downs
 
 ### Data Model and Operators
 
-The Java API is centered around three interfaces that represent distributed 
datasets: [PCollection<T>](apidocs/current/org/apache/crunch/PCollection.html),
-[PTable<K, 
V>](http://crunch.apache.org/apidocs/current/org/apache/crunch/PTable.html), 
and [PGroupedTable<K, V>](apidocs/current/org/apache/crunch/PGroupedTable.html).
+The Java API is centered around three interfaces that represent distributed 
datasets: [PCollection<T>](apidocs/0.8.0/org/apache/crunch/PCollection.html),
+[PTable<K, 
V>](http://crunch.apache.org/apidocs/0.8.0/org/apache/crunch/PTable.html), and 
[PGroupedTable<K, V>](apidocs/0.8.0/org/apache/crunch/PGroupedTable.html).
 
 A `PCollection<T>` represents a distributed, unordered collection of elements 
of type T. For example, we represent a text file as a
-`PCollection<String>` object. `PCollection<T>` provides a method, 
`parallelDo`, that applies a [DoFn<T, 
U>](apidocs/current/org/apache/crunch/DoFn.html)
+`PCollection<String>` object. `PCollection<T>` provides a method, 
`parallelDo`, that applies a [DoFn<T, 
U>](apidocs/0.8.0/org/apache/crunch/DoFn.html)
 to each element in the `PCollection<T>` in parallel, and returns an new 
`PCollection<U>` as its result.
 
 A `PTable<K, V>` is a sub-interface of `PCollection<Pair<K, V>>` that 
represents a distributed, unordered multimap of its key type K to its value 
type V.
 In addition to the parallelDo operation, PTable provides a `groupByKey` 
operation that aggregates all of the values in the PTable that
 have the same key into a single record. It is the groupByKey operation that 
triggers the sort phase of a MapReduce job. Developers can exercise
 fine-grained control over the number of reducers and the partitioning, 
grouping, and sorting strategies used during the shuffle by providing an 
instance
-of the 
[GroupingOptions](apidocs/current/org/apache/crunch/GroupingOptions.html) class 
to the `groupByKey` function.
+of the [GroupingOptions](apidocs/0.8.0/org/apache/crunch/GroupingOptions.html) 
class to the `groupByKey` function.
 
 The result of a groupByKey operation is a `PGroupedTable<K, V>` object, which 
is a distributed, sorted map of keys of type K to an Iterable<V> that may
 be iterated over exactly once. In addition to `parallelDo` processing via 
DoFns, PGroupedTable provides a `combineValues` operation that allows a
-commutative and associative 
[Aggregator<V>](apidocs/current/org/apache/crunch/Aggregator.html) to be 
applied to the values of the PGroupedTable
+commutative and associative 
[Aggregator<V>](apidocs/0.8.0/org/apache/crunch/Aggregator.html) to be applied 
to the values of the PGroupedTable
 instance on both the map and reduce sides of the shuffle. A number of common 
`Aggregator<V>` implementations are provided in the
-[Aggregators](apidocs/current/org/apache/crunch/fn/Aggregators.html) class.
+[Aggregators](apidocs/0.8.0/org/apache/crunch/fn/Aggregators.html) class.
 
 Finally, PCollection, PTable, and PGroupedTable all support a `union` 
operation, which takes a series of distinct PCollections that all have
 the same data type and treats them as a single virtual PCollection.
 
 All of the other data transformation operations supported by the Crunch APIs 
(aggregations, joins, sorts, secondary sorts, and cogrouping) are implemented
-in terms of these four primitives. The patterns themselves are defined in the 
[org.apache.crunch.lib](apidocs/current/org/apache/crunch/lib/package-summary.html)
+in terms of these four primitives. The patterns themselves are defined in the 
[org.apache.crunch.lib](apidocs/0.8.0/org/apache/crunch/lib/package-summary.html)
 package and its children, and a few of of the most common patterns have 
convenience functions defined on the PCollection and PTable interfaces.
 
 ### Writing DoFns
@@ -246,7 +246,7 @@ framework won't kill it,
 
 Crunch provides a number of helper methods for working with [Hadoop 
Counters](http://codingwiththomas.blogspot.com/2011/04/controlling-hadoop-job-recursion.html),
 all named `increment`. Counters are an incredibly useful way of keeping track 
of the state of long running data pipelines and detecting any exceptional 
conditions that
 occur during processing, and they are supported in both the MapReduce-based 
and in-memory Crunch pipeline contexts. You can retrive the value of the 
Counters
-in your client code at the end of a MapReduce pipeline by getting them from 
the 
[StageResult](apidocs/current/org/apache/crunch/PipelineResult.StageResult.html)
+in your client code at the end of a MapReduce pipeline by getting them from 
the 
[StageResult](apidocs/0.8.0/org/apache/crunch/PipelineResult.StageResult.html)
 objects returned by Crunch at the end of a run.
 
 (Note that there was a change in the Counters API from Hadoop 1.0 to Hadoop 
2.0, and thus we do not recommend that you work with the
@@ -271,18 +271,18 @@ memory setting for the DoFn's needs befo
 #### Common DoFn Patterns
 
 The Crunch APIs contain a number of useful subclasses of DoFn that handle 
common data processing scenarios and are easier
-to write and test. The top-level 
[org.apache.crunch](apidocs/current/org/apache/crunch/package-summary.html) 
package contains three
+to write and test. The top-level 
[org.apache.crunch](apidocs/0.8.0/org/apache/crunch/package-summary.html) 
package contains three
 of the most important specializations, which we will discuss now. Each of 
these specialized DoFn implementations has associated methods
 on the PCollection, PTable, and PGroupedTable interfaces to support common 
data processing steps.
 
-The simplest extension is the 
[FilterFn<T>](apidocs/current/org/apache/crunch/FilterFn.html) class, which 
defines a single abstract method, `boolean accept(T input)`.
+The simplest extension is the 
[FilterFn<T>](apidocs/0.8.0/org/apache/crunch/FilterFn.html) class, which 
defines a single abstract method, `boolean accept(T input)`.
 The FilterFn can be applied to a `PCollection<T>` by calling the 
`filter(FilterFn<T> fn)` method, and will return a new `PCollection<T>` that 
only contains
 the elements of the input PCollection for which the accept method returned 
true. Note that the filter function does not include a PType argument in its
 signature, because there is no change in the data type of the PCollection when 
the FilterFn is applied. It is possible to compose new FilterFn
 instances by combining multiple FilterFns together using the `and`, `or`, and 
`not` factory methods defined in the
-[FilterFns](apidocs/current/org/apache/crunch/fn/FilterFns.html) helper class.
+[FilterFns](apidocs/0.8.0/org/apache/crunch/fn/FilterFns.html) helper class.
 
-The second extension is the [MapFn<S, 
T>](apidocs/current/org/apache/crunch/MapFn.html) class, which defines a single 
abstract method, `T map(S input)`.
+The second extension is the [MapFn<S, 
T>](apidocs/0.8.0/org/apache/crunch/MapFn.html) class, which defines a single 
abstract method, `T map(S input)`.
 For simple transform tasks in which every input record will have exactly one 
output, it's easy to test a MapFn by verifying that a given input returns a
 every input record will have exactly one output, it's easy to test a MapFn by 
verifying that a given input returns a given output.
 
@@ -293,7 +293,7 @@ the key be given and constructs a `PTabl
 has methods `PTable<K1, V> mapKeys(MapFn<K, K1> mapFn)` and `PTable<K, V2> 
mapValues(MapFn<V, V2>)` that handle the common case of converting
 just one of the paired values in a PTable instance from one type to another 
while leaving the other type the same.
 
-The final top-level extension to DoFn is the [CombineFn<K, 
V>](apidocs/current/org/apache/crunch/CombineFn.html) class, which is used in 
conjunction with
+The final top-level extension to DoFn is the [CombineFn<K, 
V>](apidocs/0.8.0/org/apache/crunch/CombineFn.html) class, which is used in 
conjunction with
 the `combineValues` method defined on the PGroupedTable interface. CombineFns 
are used to represent the associative operations that can be applied using
 the MapReduce Combiner concept in order to reduce the amount data that is 
shipped over the network during a shuffle.
 
@@ -301,9 +301,9 @@ The CombineFn extension is different fro
 beyond the default `process` method that any other DoFn would use; rather, 
extending the CombineFn class signals to the Crunch planner that the logic
 contained in this class satisfies the conditions required for use with the 
MapReduce combiner.
 
-Crunch supports many types of these associative patterns, such as sums, 
counts, and set unions, via the 
[Aggregator<V>](apidocs/current/org/apache/crunch/Aggregator.html)
+Crunch supports many types of these associative patterns, such as sums, 
counts, and set unions, via the 
[Aggregator<V>](apidocs/0.8.0/org/apache/crunch/Aggregator.html)
 interface, which is defined right alongside the CombineFn class in the 
top-level `org.apache.crunch` package. There are a number of implementations of 
the Aggregator
-interface defined via static factory methods in the 
[Aggregators](apidocs/current/org/apache/crunch/fn/Aggregators.html) class.
+interface defined via static factory methods in the 
[Aggregators](apidocs/0.8.0/org/apache/crunch/fn/Aggregators.html) class.
 
 ### Serializing Data with PTypes
 
@@ -314,7 +314,7 @@ Why PTypes Are Necessary, the two type f
 The simplest way to create a new `PType<T>` for a data object is to create a 
_derived_ PType from one of the built-in PTypes for the Avro
 and Writable type families. If we have a base `PType<S>`, we can create a 
derived `PType<T>` by implementing an input `MapFn<S, T>` and an
 output `MapFn<T, S>` and then calling `PTypeFamily.derived(Class<T>, MapFn<S, 
T> in, MapFn<T, S> out, PType<S> base)`, which will return
-a new `PType<T>`. There are examples of derived PTypes in the 
[PTypes](apidocs/current/org/apache/crunch/types/PTypes.html) class, including
+a new `PType<T>`. There are examples of derived PTypes in the 
[PTypes](apidocs/0.8.0/org/apache/crunch/types/PTypes.html) class, including
 serialization support for protocol buffers, Thrift records, Java Enums, 
BigInteger, and UUIDs.
 
 ### Reading and Writing Data: Sources, Targets, and SourceTargets
@@ -340,8 +340,8 @@ or into a DoFn implementation that can u
 processed using the DoFn's `process` method (this is how Crunch supports 
mapside-join operations.)
 
 Support for the most common Source, Target, and SourceTarget implementations 
are provided by the factory functions declared in the
-[From](apidocs/current/org/apache/crunch/io/From.html) (Sources), 
[To](apidocs/current/org/apache/crunch/io/To.html) (Targets), and
-[At](apidocs/current/org/apache/crunch/io/At.html) (SourceTargets) classes in 
the 
[org.apache.crunch.io](apidocs/current/org/apache/crunch/io/package-summary.html)
+[From](apidocs/0.8.0/org/apache/crunch/io/From.html) (Sources), 
[To](apidocs/0.8.0/org/apache/crunch/io/To.html) (Targets), and
+[At](apidocs/0.8.0/org/apache/crunch/io/At.html) (SourceTargets) classes in 
the 
[org.apache.crunch.io](apidocs/0.8.0/org/apache/crunch/io/package-summary.html)
 package.
 
 ### Pipeline Building and Execution

svn commit: r1545156 - in /crunch/site/trunk/content: apidocs/current intro.mdtext

Reply via email to