[beam] branch master updated: [BEAM-11988] added a fix for missing sample.txt (#14889)

bhulette Fri, 28 May 2021 15:39:19 -0700

This is an automated email from the ASF dual-hosted git repository.

bhulette pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git



The following commit(s) were added to refs/heads/master by this push:
     new 79d4c03  [BEAM-11988] added a fix for missing sample.txt (#14889)
79d4c03 is described below

commit 79d4c03ea922e4f6a89f3ebb92947595dcd7121b
Author: David Huntsperger <[email protected]>
AuthorDate: Fri May 28 15:37:08 2021 -0700

    [BEAM-11988] added a fix for missing sample.txt (#14889)
    
    * added a fix for missing sample.txt
    
    * added example output for direct runner -- accidentally deleted
---
 .../site/content/en/get-started/quickstart-java.md | 187 ++++++++++++---------
 1 file changed, 106 insertions(+), 81 deletions(-)

diff --git a/website/www/site/content/en/get-started/quickstart-java.md 
b/website/www/site/content/en/get-started/quickstart-java.md
index 4286a4d..55be73e 100644
--- a/website/www/site/content/en/get-started/quickstart-java.md
+++ b/website/www/site/content/en/get-started/quickstart-java.md
@@ -35,9 +35,9 @@ If you're interested in contributing to the Apache Beam Java 
codebase, see the [
 
 1. Optional: Install [Gradle](https://gradle.org/install/) if you would like 
to convert your Maven project into Gradle.
 
-## Get the WordCount Code
+## Get the Example Code
 
-The easiest way to get a copy of the WordCount pipeline is to use the 
following command to generate a simple Maven project that contains Beam's 
WordCount examples and builds against the most recent Beam release:
+Use the following command to generate a Maven project that contains Beam's 
WordCount examples and builds against the most recent Beam release:
 
 {{< highlight class="shell-unix" >}}
 $ mvn archetype:generate \
@@ -63,7 +63,7 @@ PS> mvn archetype:generate `
  -D interactiveMode=false
 {{< /highlight >}}
 
-This will create a directory `word-count-beam` that contains a simple 
`pom.xml` and a series of example pipelines that count words in text files.
+This will create a `word-count-beam` directory that contains a `pom.xml` and 
several example pipelines that count words in text files.
 
 {{< highlight class="shell-unix" >}}
 $ cd word-count-beam/
@@ -142,6 +142,13 @@ task execute (type:JavaExec) {
 $ gradle build
 {{< /highlight >}}
 
+## Get sample text
+
+> If you're planning to use the DataflowRunner, you can skip this step. The 
runner will pull text directly from Google Cloud Storage.
+
+1. In the **word-count-beam** directory, create a file called **sample.txt**.
+1. Add some text to the file. For this example, you can use the text of 
Shakespeare's 
[Sonnets](https://storage.cloud.google.com/apache-beam-samples/shakespeare/sonnets.txt).
+
 ## Run a pipeline
 
 A single Beam pipeline can run on multiple Beam 
[runners](/documentation#runners), including the 
[FlinkRunner](/documentation/runners/flink), 
[SparkRunner](/documentation/runners/spark), 
[NemoRunner](/documentation/runners/nemo), 
[JetRunner](/documentation/runners/jet), or 
[DataflowRunner](/documentation/runners/dataflow). The 
[DirectRunner](/documentation/runners/direct) is a common runner for getting 
started, as it runs locally on your machine and requires no specific setup. If 
you're  [...]
@@ -163,25 +170,25 @@ For Unix shells:
 
 {{< highlight class="runner-direct" >}}
 $ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-     -Dexec.args="--inputFile=/path/to/inputfile --output=counts" 
-Pdirect-runner
+     -Dexec.args="--inputFile=sample.txt --output=counts" -Pdirect-runner
 {{< /highlight >}}
 
 {{< highlight class="runner-flink-local" >}}
 $ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-     -Dexec.args="--runner=FlinkRunner --inputFile=/path/to/inputfile 
--output=counts" -Pflink-runner
+     -Dexec.args="--runner=FlinkRunner --inputFile=sample.txt --output=counts" 
-Pflink-runner
 {{< /highlight >}}
 
 {{< highlight class="runner-flink-cluster" >}}
 $ mvn package exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
      -Dexec.args="--runner=FlinkRunner --flinkMaster=<flink master> 
--filesToStage=target/word-count-beam-bundled-0.1.jar \
-                  --inputFile=/path/to/quickstart/pom.xml 
--output=/tmp/counts" -Pflink-runner
+                  --inputFile=sample.txt --output=/tmp/counts" -Pflink-runner
 
 You can monitor the running job by visiting the Flink dashboard at 
http://<flink master>:8081
 {{< /highlight >}}
 
 {{< highlight class="runner-spark" >}}
 $ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-     -Dexec.args="--runner=SparkRunner --inputFile=/path/to/inputfile 
--output=counts" -Pspark-runner
+     -Dexec.args="--runner=SparkRunner --inputFile=sample.txt --output=counts" 
-Pspark-runner
 {{< /highlight >}}
 
 {{< highlight class="runner-dataflow" >}}
@@ -197,43 +204,43 @@ $ mvn compile exec:java 
-Dexec.mainClass=org.apache.beam.examples.WordCount \
 
 {{< highlight class="runner-samza-local" >}}
 $ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-     -Dexec.args="--inputFile=/path/to/inputfile --output=/tmp/counts 
--runner=SamzaRunner" -Psamza-runner
+     -Dexec.args="--inputFile=sample.txt --output=/tmp/counts 
--runner=SamzaRunner" -Psamza-runner
 {{< /highlight >}}
 
 {{< highlight class="runner-nemo" >}}
 $ mvn package -Pnemo-runner && java -cp target/word-count-beam-bundled-0.1.jar 
org.apache.beam.examples.WordCount \
-     --runner=NemoRunner --inputFile=`pwd`/pom.xml --output=counts
+     --runner=NemoRunner --inputFile=`pwd`/sample.txt --output=counts
 {{< /highlight >}}
 
 {{< highlight class="runner-jet" >}}
 $ mvn package -Pjet-runner
 $ java -cp target/word-count-beam-bundled-0.1.jar 
org.apache.beam.examples.WordCount \
-     --runner=JetRunner --jetLocalMode=3 --inputFile=`pwd`/pom.xml 
--output=counts
+     --runner=JetRunner --jetLocalMode=3 --inputFile=`pwd`/sample.txt 
--output=counts
 {{< /highlight >}}
 
 For Windows PowerShell:
 
 {{< highlight class="runner-direct" >}}
 PS> mvn compile exec:java -D exec.mainClass=org.apache.beam.examples.WordCount 
`
- -D exec.args="--inputFile=/path/to/inputfile --output=counts" -P direct-runner
+ -D exec.args="--inputFile=sample.txt --output=counts" -P direct-runner
 {{< /highlight >}}
 
 {{< highlight class="runner-flink-local" >}}
 PS> mvn compile exec:java -D exec.mainClass=org.apache.beam.examples.WordCount 
`
- -D exec.args="--runner=FlinkRunner --inputFile=/path/to/inputfile 
--output=counts" -P flink-runner
+ -D exec.args="--runner=FlinkRunner --inputFile=sample.txt --output=counts" -P 
flink-runner
 {{< /highlight >}}
 
 {{< highlight class="runner-flink-cluster" >}}
 PS> mvn package exec:java -D exec.mainClass=org.apache.beam.examples.WordCount 
`
  -D exec.args="--runner=FlinkRunner --flinkMaster=<flink master> 
--filesToStage=.\target\word-count-beam-bundled-0.1.jar `
-               --inputFile=C:\path\to\quickstart\pom.xml 
--output=C:\tmp\counts" -P flink-runner
+               --inputFile=C:\path\to\quickstart\sample.txt 
--output=C:\tmp\counts" -P flink-runner
 
 You can monitor the running job by visiting the Flink dashboard at 
http://<flink master>:8081
 {{< /highlight >}}
 
 {{< highlight class="runner-spark" >}}
 PS> mvn compile exec:java -D exec.mainClass=org.apache.beam.examples.WordCount 
`
- -D exec.args="--runner=SparkRunner --inputFile=/path/to/inputfile 
--output=counts" -P spark-runner
+ -D exec.args="--runner=SparkRunner --inputFile=sample.txt --output=counts" -P 
spark-runner
 {{< /highlight >}}
 
 {{< highlight class="runner-dataflow" >}}
@@ -249,19 +256,19 @@ PS> mvn compile exec:java -D 
exec.mainClass=org.apache.beam.examples.WordCount `
 
 {{< highlight class="runner-samza-local" >}}
 PS> mvn compile exec:java -D exec.mainClass=org.apache.beam.examples.WordCount 
`
-     -D exec.args="--inputFile=/path/to/inputfile --output=/tmp/counts 
--runner=SamzaRunner" -P samza-runner
+     -D exec.args="--inputFile=sample.txt --output=/tmp/counts 
--runner=SamzaRunner" -P samza-runner
 {{< /highlight >}}
 
 {{< highlight class="runner-nemo" >}}
 PS> mvn package -P nemo-runner -DskipTests
 PS> java -cp target/word-count-beam-bundled-0.1.jar 
org.apache.beam.examples.WordCount `
-      --runner=NemoRunner --inputFile=`pwd`/pom.xml --output=counts
+      --runner=NemoRunner --inputFile=`pwd`/sample.txt --output=counts
 {{< /highlight >}}
 
 {{< highlight class="runner-jet" >}}
 PS> mvn package -P jet-runner
 PS> java -cp target/word-count-beam-bundled-0.1.jar 
org.apache.beam.examples.WordCount `
-      --runner=JetRunner --jetLocalMode=3 --inputFile=$pwd/pom.xml 
--output=counts
+      --runner=JetRunner --jetLocalMode=3 --inputFile=$pwd/sample.txt 
--output=counts
 {{< /highlight >}}
 
 ### Run WordCount Using Gradle
@@ -270,7 +277,7 @@ For Unix shells (Instructions currently only available for 
Direct, Spark, and Da
 
 {{< highlight class="runner-direct">}}
 $ gradle clean execute -DmainClass=org.apache.beam.examples.WordCount \
-    -Dexec.args="--inputFile=/path/to/inputfile --output=counts" 
-Pdirect-runner
+    -Dexec.args="--inputFile=sample.txt --output=counts" -Pdirect-runner
 {{< /highlight >}}
 
 {{< highlight class="runner-apex">}}
@@ -287,7 +294,7 @@ We are working on adding the instruction for this runner!
 
 {{< highlight class="runner-spark" >}}
 $ gradle clean execute -DmainClass=org.apache.beam.examples.WordCount \
-    -Dexec.args="--inputFile=/path/to/inputfile --output=counts" -Pspark-runner
+    -Dexec.args="--inputFile=sample.txt --output=counts" -Pspark-runner
 {{< /highlight >}}
 
 {{< highlight class="runner-dataflow" >}}
@@ -348,104 +355,122 @@ When you look into the contents of the file, you'll see 
that they contain unique
 
 {{< highlight class="runner-direct" >}}
 $ more counts*
-api: 9
-bundled: 1
-old: 4
-Apache: 2
-The: 1
-limitations: 1
-Foundation: 1
+wrought: 2
+st: 32
+fresher: 1
+of: 351
+souls: 2
+CXVIII: 1
+reviewest: 1
+untold: 1
+th: 1
+single: 4
 ...
 {{< /highlight >}}
 
 {{< highlight class="runner-flink-local" >}}
 $ more counts*
-The: 1
-api: 9
-old: 4
-Apache: 2
-limitations: 1
-bundled: 1
-Foundation: 1
+wrought: 2
+st: 32
+fresher: 1
+of: 351
+souls: 2
+CXVIII: 1
+reviewest: 1
+untold: 1
+th: 1
+single: 4
 ...
 {{< /highlight >}}
 
 {{< highlight class="runner-flink-cluster" >}}
 $ more /tmp/counts*
-The: 1
-api: 9
-old: 4
-Apache: 2
-limitations: 1
-bundled: 1
-Foundation: 1
+wrought: 2
+st: 32
+fresher: 1
+of: 351
+souls: 2
+CXVIII: 1
+reviewest: 1
+untold: 1
+th: 1
+single: 4
 ...
 {{< /highlight >}}
 
 {{< highlight class="runner-spark" >}}
 $ more counts*
-beam: 27
-SF: 1
-fat: 1
-job: 1
-limitations: 1
-require: 1
-of: 11
-profile: 10
+wrought: 2
+st: 32
+fresher: 1
+of: 351
+souls: 2
+CXVIII: 1
+reviewest: 1
+untold: 1
+th: 1
+single: 4
 ...
 {{< /highlight >}}
 
 
 {{< highlight class="runner-dataflow" >}}
 $ gsutil cat gs://<your-gcs-bucket>/counts*
-feature: 15
-smother'st: 1
-revelry: 1
-bashfulness: 1
-Bashful: 1
-Below: 2
-deserves: 32
-barrenly: 1
+wrought: 2
+st: 32
+fresher: 1
+of: 351
+souls: 2
+CXVIII: 1
+reviewest: 1
+untold: 1
+th: 1
+single: 4
 ...
 {{< /highlight >}}
 
 {{< highlight class="runner-samza-local" >}}
 $ more /tmp/counts*
-api: 7
-are: 2
-can: 2
-com: 14
-end: 14
-for: 14
-has: 2
+wrought: 2
+st: 32
+fresher: 1
+of: 351
+souls: 2
+CXVIII: 1
+reviewest: 1
+untold: 1
+th: 1
+single: 4
 ...
 {{< /highlight >}}
 
 {{< highlight class="runner-nemo" >}}
 $ more counts*
-cluster: 2
-handler: 1
-plugins: 9
-exclusions: 14
-finalName: 2
-Adds: 2
-java: 7
-xml: 1
+wrought: 2
+st: 32
+fresher: 1
+of: 351
+souls: 2
+CXVIII: 1
+reviewest: 1
+untold: 1
+th: 1
+single: 4
 ...
 {{< /highlight >}}
 
 {{< highlight class="runner-jet" >}}
 $ more counts*
-FlinkRunner: 1
-cleanupDaemonThreads: 2
-sdks: 4
-unit: 1
-Apache: 3
-IO: 2
-copyright: 1
-governing: 1
-overrides: 1
-YARN: 1
+wrought: 2
+st: 32
+fresher: 1
+of: 351
+souls: 2
+CXVIII: 1
+reviewest: 1
+untold: 1
+th: 1
+single: 4
 ...
 {{< /highlight >}}

[beam] branch master updated: [BEAM-11988] added a fix for missing sample.txt (#14889)

Reply via email to