[beam-site] 01/05: [BEAM-2839, BEAM-2838] Add MapReduce runner to Beam asf-site.

mergebot-role Mon, 18 Sep 2017 20:44:41 -0700

This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git


commit e2ab466cb8f9d2b1db338eaa08c2d4d16469523a
Author: Pei He <p...@apache.org>
AuthorDate: Fri Sep 8 15:18:11 2017 +0800

    [BEAM-2839, BEAM-2838] Add MapReduce runner to Beam asf-site.
---
 src/_data/capability-matrix.yml        | 114 +++++++++++++++++++++++++++++++++
 src/documentation/runners/mapreduce.md |  80 +++++++++++++++++++++++
 src/get-started/beam-overview.md       |   2 +
 src/images/logos/runners/mapreduce.png | Bin 0 -> 37095 bytes
 4 files changed, 196 insertions(+)

diff --git a/src/_data/capability-matrix.yml b/src/_data/capability-matrix.yml
index 775e0da..c4bbb3b 100644
--- a/src/_data/capability-matrix.yml
+++ b/src/_data/capability-matrix.yml
@@ -11,6 +11,8 @@ columns:
     name: Apache Apex
   - class: gearpump
     name: Apache Gearpump
+  - class: mapreduce
+    name: MapReduce
 
 categories:
   - description: What is being computed?
@@ -46,6 +48,10 @@ categories:
             l1: 'Yes'
             l2: fully supported
             l3: Gearpump wraps the per-element transformation function into 
processor execution.
+          - class: mapreduce
+            l1: 'Yes'
+            l2: fully supported
+            l3: ''
       - name: GroupByKey
         values:
           - class: model
@@ -72,6 +78,10 @@ categories:
             l1: 'Yes'
             l2: fully supported
             l3: "Use Gearpump's groupBy and window for key grouping and 
translate Beam's windowing and triggering to Gearpump's internal 
implementation."
+          - class: mapreduce
+            l1: 'Yes'
+            l2: fully supported
+            l3: ''
       - name: Flatten
         values:
           - class: model
@@ -98,6 +108,10 @@ categories:
             l1: 'Yes'
             l2: fully supported
             l3: ''
+          - class: mapreduce
+            l1: 'Yes'
+            l2: fully supported
+            l3: ''
       - name: Combine
         values:
           - class: model
@@ -124,6 +138,10 @@ categories:
             l1: 'Yes'
             l2: fully supported
             l3: ''
+          - class: mapreduce
+            l1: 'Yes'
+            l2: fully supported
+            l3: ''
       - name: Composite Transforms
         values:
           - class: model
@@ -150,6 +168,10 @@ categories:
             l1: 'Partially'
             l2: supported via inlining
             l3: ''
+          - class: mapreduce
+            l1: 'Yes'
+            l2: fully supported
+            l3: ''
       - name: Side Inputs
         values:
           - class: model
@@ -176,6 +198,10 @@ categories:
             l1: 'Yes'
             l2: fully supported
             l3: Implemented by merging side input as a normal stream in 
Gearpump
+          - class: mapreduce
+            l1: 'Yes'
+            l2: fully supported
+            l3: ''
       - name: Source API
         values:
           - class: model
@@ -202,6 +228,10 @@ categories:
             l1: 'Yes'
             l2: fully supported
             l3: ''
+          - class: mapreduce
+            l1: 'Partially'
+            l2: bounded source only
+            l3: ''
       - name: Splittable DoFn
         values:
           - class: model
@@ -228,6 +258,10 @@ categories:
             l1: 'No'
             l2: not implemented
             l3:
+          - class: mapreduce
+            l1: 'No'
+            l2: not implemented
+            l3:
       - name: Metrics
         values:
           - class: model
@@ -254,6 +288,10 @@ categories:
             l1: 'No'
             l2: ''
             l3: not implemented
+          - class: mapreduce
+            l1: 'Partially'
+            l2: Only attempted counters are supported
+            l3: ''
       - name: Stateful Processing
         values:
           - class: model
@@ -280,6 +318,10 @@ categories:
             l1: 'No'
             l2: not implemented
             l3: ''
+          - class: mapreduce
+            l1: 'Partially'
+            l2: non-merging windows
+            l3: ''
   - description: Where in event time?
     anchor: where
     color-b: '37d'
@@ -313,6 +355,10 @@ categories:
             l1: 'Yes'
             l2: supported
             l3: ''
+          - class: mapreduce
+            l1: 'Yes'
+            l2: supported
+            l3: ''
       - name: Fixed windows
         values:
           - class: model
@@ -339,6 +385,10 @@ categories:
             l1: 'Yes'
             l2: supported
             l3: ''
+          - class: mapreduce
+            l1: 'Yes'
+            l2: supported
+            l3: ''
       - name: Sliding windows
         values:
           - class: model
@@ -365,6 +415,10 @@ categories:
             l1: 'Yes'
             l2: supported
             l3: ''
+          - class: mapreduce
+            l1: 'Yes'
+            l2: supported
+            l3: ''
       - name: Session windows
         values:
           - class: model
@@ -391,6 +445,10 @@ categories:
             l1: 'Yes'
             l2: supported
             l3: ''
+          - class: mapreduce
+            l1: 'Yes'
+            l2: supported
+            l3: ''
       - name: Custom windows
         values:
           - class: model
@@ -417,6 +475,10 @@ categories:
             l1: 'Yes'
             l2: supported
             l3: ''
+          - class: mapreduce
+            l1: 'Yes'
+            l2: supported
+            l3: ''
       - name: Custom merging windows
         values:
           - class: model
@@ -443,6 +505,10 @@ categories:
             l1: 'Yes'
             l2: supported
             l3: ''
+          - class: mapreduce
+            l1: 'Yes'
+            l2: supported
+            l3: ''
       - name: Timestamp control
         values:
           - class: model
@@ -469,6 +535,10 @@ categories:
             l1: 'Yes'
             l2: supported
             l3: ''
+          - class: mapreduce
+            l1: 'Yes'
+            l2: supported
+            l3: ''
 
 
   - description: When in processing time?
@@ -505,6 +575,10 @@ categories:
             l1: 'No'
             l2: ''
             l3: ''
+          - class: mapreduce
+            l1: 'Yes'
+            l2: Intermediate trigger firings are effectively meaningless.
+            l3: ''
 
       - name: Event-time triggers
         values:
@@ -532,6 +606,10 @@ categories:
             l1: 'Yes'
             l2: fully supported
             l3: ''
+          - class: mapreduce
+            l1: 'Yes'
+            l2: Currently watermark progress jumps from the beginning of time 
to the end of time once the input has been fully consumed, thus no additional 
triggering granularity is available.
+            l3: ''
 
       - name: Processing-time triggers
         values:
@@ -559,6 +637,10 @@ categories:
             l1: 'No'
             l2: ''
             l3: ''
+          - class: mapreduce
+            l1: 'Yes'
+            l2: From the perspective of triggers, processing time currently 
jumps from the beginning of time to the end of time once the input has been 
fully consumed, thus no additional triggering granularity is available.
+            l3: ''
 
       - name: Count triggers
         values:
@@ -586,6 +668,10 @@ categories:
             l1: 'No'
             l2: ''
             l3: ''
+          - class: mapreduce
+            l1: 'Yes'
+            l2: Elements are processed in the largest bundles possible, so 
count-based triggers are effectively meaningless.
+            l3: ''
 
       - name: '[Meta]data driven triggers'
         values:
@@ -614,6 +700,10 @@ categories:
             l1: 'No'
             l2: pending model support
             l3:
+          - class: mapreduce
+            l1: 'No'
+            l2: pending model support
+            l3:
 
       - name: Composite triggers
         values:
@@ -641,6 +731,10 @@ categories:
             l1: 'No'
             l2: ''
             l3: ''
+          - class: mapreduce
+            l1: 'Yes'
+            l2: fully supported
+            l3: ''
 
       - name: Allowed lateness
         values:
@@ -668,6 +762,10 @@ categories:
             l1: 'Yes'
             l2: fully supported
             l3: ''
+          - class: mapreduce
+            l1: 'Yes'
+            l2: No data is ever late.
+            l3: ''
 
       - name: Timers
         values:
@@ -695,6 +793,10 @@ categories:
             l1: 'No'
             l2: not implemented
             l3: ''
+          - class: mapreduce
+            l1: 'Partially'
+            l2: not implemented
+            l3: ''
 
   - description: How do refinements relate?
     anchor: how
@@ -730,6 +832,10 @@ categories:
             l1: 'Yes'
             l2: fully supported
             l3: ''
+          - class: mapreduce
+            l1: 'Yes'
+            l2: fully supported
+            l3: ''
 
       - name: Accumulating
         values:
@@ -757,6 +863,10 @@ categories:
             l1: 'No'
             l2: ''
             l3: ''
+          - class: mapreduce
+            l1: 'Yes'
+            l2: fully supported
+            l3: ''
 
       - name: 'Accumulating &amp; Retracting'
         values:
@@ -785,3 +895,7 @@ categories:
             l1: 'No'
             l2: pending model support
             l3: ''
+          - class: mapreduce
+            l1: 'No'
+            l2: pending model support
+            l3: ''
diff --git a/src/documentation/runners/mapreduce.md 
b/src/documentation/runners/mapreduce.md
new file mode 100644
index 0000000..c88870e
--- /dev/null
+++ b/src/documentation/runners/mapreduce.md
@@ -0,0 +1,80 @@
+---
+layout: default
+title: "Apache Hadoop MapReduce Runner"
+permalink: /documentation/runners/mapreduce/
+redirect_from: /learn/runners/mapreduce/
+---
+# Using the Apache Hadoop MapReduce Runner
+
+The Apache Hadoop MapReduce Runner can be used to execute Beam pipelines using 
[Apache Hadoop](http://hadoop.apache.org/).
+
+The [Beam Capability Matrix]({{ site.baseurl 
}}/documentation/runners/capability-matrix/) documents the currently supported 
capabilities of the Apache Hadoop MapReduce Runner.
+
+## Apache Hadoop MapReduce Runner prerequisites and setup
+You need to have an Apache Hadoop environment with either [Single Node 
Setup](https://hadoop.apache.org/docs/r1.2.1/single_node_setup.html) or 
[Cluster Setup](https://hadoop.apache.org/docs/r1.2.1/cluster_setup.html)
+
+The Apache Hadoop MapReduce runner currently supports Apache Hadoop 2.8.1 
version.
+
+You can add a dependency on the latest version of the Apache Hadoop MapReduce 
runner by adding to your pom.xml the following:
+```java
+<dependency>
+  <groupId>org.apache.beam</groupId>
+  <artifactId>beam-runners-mapreduce</artifactId>
+  <version>{{ site.release_latest }}</version>
+</dependency>
+```
+
+## Deploying Apache Hadoop MapReduce with your application
+To execute in a local hadoop environment, use this command:
+```
+$ mvn exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
+    -Pmapreduce-runner \
+    -Dexec.args="--runner=MapReduceRunner \
+      --inputFile=/path/to/pom.xml \
+      --output=/path/to/counts \
+      --fileOutputDir=<directory for intermediate outputs>"
+```
+
+To execute in a hadoop cluster, you need to package your program along will 
all dependencies in a so-called fat jar.
+
+If you follow along the [Beam Quickstart]({{ site.baseurl 
}}/get-started/quickstart/) this is the command that you can run:
+```
+$ mvn package -Pflink-runner
+```
+
+For actually running the pipeline you would use this command
+```
+$ yarn jar word-count-beam-bundled-0.1.jar \
+    org.apache.beam.examples.WordCount \
+    --runner=MapReduceRunner \
+    --inputFile=/path/to/pom.xml \
+      --output=/path/to/counts \
+      --fileOutputDir=<directory for intermediate outputs>"
+```
+
+## Pipeline options for the Apache Hadoop MapReduce Runner
+
+When executing your pipeline with the Apache Hadoop MapReduce Runner, you 
should consider the following pipeline options.
+
+<table class="table table-bordered">
+<tr>
+  <th>Field</th>
+  <th>Description</th>
+  <th>Default Value</th>
+</tr>
+<tr>
+  <td><code>runner</code></td>
+  <td>The pipeline runner to use. This option allows you to determine the 
pipeline runner at runtime.</td>
+  <td>Set to <code>MapReduceRunner</code> to run using the Apache Hadoop 
MapReduce.</td>
+</tr>
+<tr>
+  <td><code>jarClass</code></td>
+  <td>The jar class of the user Beam program.</td>
+  <td>JarClassInstanceFactory.class</td>
+</tr>
+<tr>
+  <td><code>fileOutputDir</code></td>
+  <td>The directory for files output.</td>
+  <td>"/tmp/mapreduce/"</td>
+</tr>
+</table>
diff --git a/src/get-started/beam-overview.md b/src/get-started/beam-overview.md
index 1d3bbc6..e320c3f 100644
--- a/src/get-started/beam-overview.md
+++ b/src/get-started/beam-overview.md
@@ -36,6 +36,8 @@ Beam currently supports Runners that work with the following 
distributed process
          alt="Apache Flink">
 * Apache Gearpump (incubating) <img src="{{ site.baseurl 
}}/images/logos/runners/gearpump.png"
          alt="Apache Gearpump">
+* Apache Hadoop MapReduce <img src="{{ site.baseurl 
}}/images/logos/runners/mapreduce.png"
+         alt="Apache Hadoop MapReduce">
 * Apache Spark <img src="{{ site.baseurl }}/images/logos/runners/spark.png"
          alt="Apache Spark">
 * Google Cloud Dataflow <img src="{{ site.baseurl 
}}/images/logos/runners/dataflow.png"
diff --git a/src/images/logos/runners/mapreduce.png 
b/src/images/logos/runners/mapreduce.png
new file mode 100644
index 0000000..78af2c6
Binary files /dev/null and b/src/images/logos/runners/mapreduce.png differ

-- 
To stop receiving notification emails like this one, please contact
"commits@beam.apache.org" <commits@beam.apache.org>.

[beam-site] 01/05: [BEAM-2839, BEAM-2838] Add MapReduce runner to Beam asf-site.

Reply via email to