[GitHub] [beam] pcoet commented on a change in pull request #17033: BEAM-12770: Documented maven-to-gradle conversion for Dataflow

GitBox Mon, 14 Mar 2022 17:44:03 -0700


pcoet commented on a change in pull request #17033:
URL: https://github.com/apache/beam/pull/17033#discussion_r826488518




##########
File path: website/www/site/content/en/get-started/quickstart-java.md
##########
@@ -19,462 +19,376 @@ See the License for the specific language governing 
permissions and
 limitations under the License.
 -->
 
-# Apache Beam Java SDK Quickstart
+# Apache Beam Java SDK quickstart
 
-This quickstart shows you how to set up a Java development environment and run 
an [example pipeline](/get-started/wordcount-example) written with the [Apache 
Beam Java SDK](/documentation/sdks/java), using a 
[runner](/documentation#runners) of your choice.
+This quickstart shows you how to set up a Java development environment and run
+an [example pipeline](/get-started/wordcount-example) written with the
+[Apache Beam Java SDK](/documentation/sdks/java), using a
+[runner](/documentation#runners) of your choice.
 
-If you're interested in contributing to the Apache Beam Java codebase, see the 
[Contribution Guide](/contribute).
+If you're interested in contributing to the Apache Beam Java codebase, see the
+[Contribution Guide](/contribute).
 
-{{< toc >}}
+On this page:
 
-## Set up your Development Environment
-
-1. Download and install the [Java Development Kit 
(JDK)](https://www.oracle.com/technetwork/java/javase/downloads/index.html) 
version 8. Verify that the 
[JAVA_HOME](https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/envvars001.html)
 environment variable is set and points to your JDK installation.
-
-1. Download and install [Apache Maven](https://maven.apache.org/download.cgi) 
by following Maven's [installation 
guide](https://maven.apache.org/install.html) for your specific operating 
system.
-
-1. Optional: Install [Gradle](https://gradle.org/install/) if you would like 
to convert your Maven project into Gradle.
-
-## Get the Example Code
-
-Use the following command to generate a Maven project that contains Beam's 
WordCount examples and builds against the most recent Beam release:
-
-{{< shell unix >}}
-$ mvn archetype:generate \
-      -DarchetypeGroupId=org.apache.beam \
-      -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
-      -DarchetypeVersion={{< param release_latest >}} \
-      -DgroupId=org.example \
-      -DartifactId=word-count-beam \
-      -Dversion="0.1" \
-      -Dpackage=org.apache.beam.examples \
-      -DinteractiveMode=false
-{{< /shell >}}
-
-{{< shell powerShell >}}
-PS> mvn archetype:generate `
- -D archetypeGroupId=org.apache.beam `
- -D archetypeArtifactId=beam-sdks-java-maven-archetypes-examples `
- -D archetypeVersion={{< param release_latest >}} `
- -D groupId=org.example `
- -D artifactId=word-count-beam `
- -D version="0.1" `
- -D package=org.apache.beam.examples `
- -D interactiveMode=false
-{{< /shell >}}
-
-This will create a `word-count-beam` directory that contains a `pom.xml` and 
several example pipelines that count words in text files.
-
-{{< shell unix >}}
-$ cd word-count-beam/
-
-$ ls
-pom.xml        src
-
-$ ls src/main/java/org/apache/beam/examples/
-DebuggingWordCount.java        WindowedWordCount.java  common
-MinimalWordCount.java  WordCount.java
-{{< /shell >}}
-
-{{< shell powerShell >}}
-PS> cd .\word-count-beam
-
-PS> dir
-
-...
-
-Mode                LastWriteTime         Length Name
-----                -------------         ------ ----
-d-----        7/19/2018  11:00 PM                src
--a----        7/19/2018  11:00 PM          16051 pom.xml
-
-PS> dir .\src\main\java\org\apache\beam\examples
-
-...
-Mode                LastWriteTime         Length Name
-----                -------------         ------ ----
-d-----        7/19/2018  11:00 PM                common
-d-----        7/19/2018  11:00 PM                complete
-d-----        7/19/2018  11:00 PM                subprocess
--a----        7/19/2018  11:00 PM           7073 DebuggingWordCount.java
--a----        7/19/2018  11:00 PM           5945 MinimalWordCount.java
--a----        7/19/2018  11:00 PM           9490 WindowedWordCount.java
--a----        7/19/2018  11:00 PM           7662 WordCount.java
-{{< /shell >}}
-
-For a detailed introduction to the Beam concepts used in these examples, see 
the [WordCount Example Walkthrough](/get-started/wordcount-example). Here, 
we'll just focus on executing `WordCount.java`.
-
-## Optional: Convert from Maven to Gradle Project
-
-The steps below explain how to convert the build for the Direct Runner from 
Maven to Gradle. Converting the builds for the other runners is a more involved 
process and is out of scope for this guide. For additional guidance, see 
[Migrating Builds From Apache 
Maven](https://docs.gradle.org/current/userguide/migrating_from_maven.html).
-
-1. Ensure you are in the same directory as the `pom.xml` file generated from 
the previous step. Automatically convert your project from Maven to Gradle by 
running:
-{{< highlight >}}
-$ gradle init
-{{< /highlight >}}
-You'll be asked if you want to generate a Gradle build. Enter **yes**. You'll 
also be prompted to choose a DSL (Groovy or Kotlin). This tutorial uses Groovy, 
so select that if you don't have a preference.
-1. After you've converted the project to Gradle, open the generated 
`build.gradle` file, and, in the `repositories` block, replace `mavenLocal()` 
with `mavenCentral()`:
-{{< highlight >}}
-repositories {
-    mavenCentral()
-    maven {
-        url = 
uri('https://repository.apache.org/content/repositories/snapshots/')
-    }
+{{< toc >}}
 
-    maven {
-        url = uri('http://repo.maven.apache.org/maven2')
+## Set up your development environment
+
+1. Download and install the
+  [Java Development Kit 
(JDK)](https://www.oracle.com/technetwork/java/javase/downloads/index.html)
+  version 8, 11, or 17. Verify that the
+  
[JAVA_HOME](https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/envvars001.html)
+  environment variable is set and points to your JDK installation.
+2. Download and install [Apache Maven](https://maven.apache.org/download.cgi) 
by
+   following the [installation guide](https://maven.apache.org/install.html)
+   for your operating system.
+3. Optional: If you want to convert your Maven project to Gradle, install
+   [Gradle](https://gradle.org/install/).
+
+## Get the example code
+
+1. Generate a Maven example project that builds against the latest Beam 
release:
+   {{< shell unix >}}
+mvn archetype:generate \
+    -DarchetypeGroupId=org.apache.beam \
+    -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
+    -DarchetypeVersion={{< param release_latest >}} \
+    -DgroupId=org.example \
+    -DartifactId=word-count-beam \
+    -Dversion="0.1" \
+    -Dpackage=org.apache.beam.examples \
+    -DinteractiveMode=false
+   {{< /shell >}}
+   {{< shell powerShell >}}
+mvn archetype:generate `
+  -D archetypeGroupId=org.apache.beam `
+  -D archetypeArtifactId=beam-sdks-java-maven-archetypes-examples `
+  -D archetypeVersion={{< param release_latest >}} `
+  -D groupId=org.example `
+  -D artifactId=word-count-beam `
+  -D version="0.1" `
+  -D package=org.apache.beam.examples `
+  -D interactiveMode=false
+   {{< /shell >}}
+
+   Maven creates a new project in the **word-count-beam** directory.
+
+2. Change into **word-count-beam**:
+   {{< shell unix >}}
+cd word-count-beam/
+   {{< /shell >}}
+   {{< shell powerShell >}}
+cd .\word-count-beam
+   {{< /shell >}}
+   The directory contains a **pom.xml** and a **src** directory with example
+   pipelines.
+
+3. List the example pipelines:
+   {{< shell unix >}}
+ls src/main/java/org/apache/beam/examples/
+   {{< /shell >}}
+   {{< shell powerShell >}}
+dir .\src\main\java\org\apache\beam\examples
+   {{< /shell >}}
+   You should see the following examples:
+   * **DebuggingWordCount.java** 
([GitHub](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/DebuggingWordCount.java))
+   * **MinimalWordCount.java** 
([GitHub](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java))
+   * **WindowedWordCount.java** 
([GitHub](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WindowedWordCount.java))
+   * **WordCount.java** 
([GitHub](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WordCount.java))
+
+   The example used in this tutorial, **WordCount.java**, defines a
+   Beam pipeline that counts words from an input file (by default, a **.txt**
+   file containing Shakespeare's "King Lear"). To learn more about the 
examples,
+   see the [WordCount Example Walkthrough](/get-started/wordcount-example).
+
+## Optional: Convert from Maven to Gradle
+
+The steps below explain how to convert the build from Maven to Gradle for the
+following runners:
+* Direct runner
+* Dataflow runner
+
+The conversion process for other runners is similar. For additional guidance,
+see
+[Migrating Builds From Apache 
Maven](https://docs.gradle.org/current/userguide/migrating_from_maven.html).
+
+1. In the directory with the `pom.xml` file, run the automated Maven-to-Gradle
+   conversion:
+   {{< highlight >}}
+gradle init
+   {{< /highlight >}}
+   You'll be asked if you want to generate a Gradle build. Enter **yes**. 
You'll
+   also be prompted to choose a DSL (Groovy or Kotlin). For this tutorial, 
enter
+   **2** for Kotlin.
+2. Open the generated **build.gradle.kts** file and make the following changes:
+   1. In `repositories`, replace `mavenLocal()` with `mavenCentral()`.
+   2. In `repositories`, declare a repository for Confluent Kafka dependencies:
+      {{< highlight >}}
+maven {
+    url = uri("https://packages.confluent.io/maven/";)
+}
+      {{< /highlight >}}
+   3. At the end of the build script, add the following conditional dependency:
+      {{< highlight >}}
+if (project.hasProperty("dataflow-runner")) {
+    dependencies {
+        
runtimeOnly("org.apache.beam:beam-runners-google-cloud-dataflow-java:{{< param 
release_latest >}}")
     }
 }
-{{< /highlight >}}
-1. Add the following task in `build.gradle` to allow you to execute pipelines 
with Gradle:
-{{< highlight >}}
-task execute (type:JavaExec) {
-    mainClass = System.getProperty("mainClass")
-    classpath = sourceSets.main.runtimeClasspath
-    systemProperties System.getProperties()
-    args System.getProperty("exec.args", "").split()
+      {{< /highlight >}}
+   4. At the end of the build script, add the following task:
+      {{< highlight >}}
+task("execute", JavaExec::class) {
+    classpath = sourceSets["main"].runtimeClasspath
+    mainClass.set(System.getProperty("mainClass"))
 }
-{{< /highlight >}}
-1. Rebuild your project by running:
-{{< highlight >}}
-$ gradle build
-{{< /highlight >}}
+      {{< /highlight >}}
+4. Build your project:
+   {{< highlight >}}
+gradle build
+   {{< /highlight >}}
 
 ## Get sample text
 
-> If you're planning to use the DataflowRunner, you can skip this step. The 
runner will pull text directly from Google Cloud Storage.
+> If you're planning to use the DataflowRunner, you can skip this step. The
+  runner will pull text directly from Google Cloud Storage.
 
 1. In the **word-count-beam** directory, create a file called **sample.txt**.
-1. Add some text to the file. For this example, you can use the text of 
Shakespeare's 
[Sonnets](https://storage.cloud.google.com/apache-beam-samples/shakespeare/sonnets.txt).
+2. Add some text to the file. For this example, use the text of Shakespeare's
+   [King 
Lear](https://storage.cloud.google.com/apache-beam-samples/shakespeare/kinglear.txt).
 
 ## Run a pipeline
 
-A single Beam pipeline can run on multiple Beam 
[runners](/documentation#runners), including the 
[FlinkRunner](/documentation/runners/flink), 
[SparkRunner](/documentation/runners/spark), 
[NemoRunner](/documentation/runners/nemo), 
[JetRunner](/documentation/runners/jet), or 
[DataflowRunner](/documentation/runners/dataflow). The 
[DirectRunner](/documentation/runners/direct) is a common runner for getting 
started, as it runs locally on your machine and requires no specific setup. If 
you're just trying out Beam and you're not sure what to use, use the 
[DirectRunner](/documentation/runners/direct).
+A single Beam pipeline can run on multiple Beam
+[runners](/documentation#runners), including the
+[FlinkRunner](/documentation/runners/flink),
+[SparkRunner](/documentation/runners/spark),
+[NemoRunner](/documentation/runners/nemo),
+[JetRunner](/documentation/runners/jet), and
+[DataflowRunner](/documentation/runners/dataflow). The
+[DirectRunner](/documentation/runners/direct) is useful for getting started,
+because it runs on your machine and requires no specific setup. If you're just
+trying out Beam and you're not sure what to use, use the
+[DirectRunner](/documentation/runners/direct).
 
 The general process for running a pipeline goes like this:
 
-1.  Ensure you've done any runner-specific setup.
-1.  Build your command line:
-    1. Specify a runner with `--runner=<runner>` (defaults to the 
[DirectRunner](/documentation/runners/direct)).
-    1. Add any runner-specific required options.
-    1. Choose input files and an output location that are accessible to the 
runner. (For example, you can't access a local file if you are running the 
pipeline on an external cluster.)
-1.  Run the command.
+1.  Complete any runner-specific setup.
+2.  Build your command line:
+    1. Specify a runner with `--runner=<runner>` (defaults to the
+       [DirectRunner](/documentation/runners/direct)).
+    2. Add any runner-specific required options.
+    3. Choose input files and an output location that are accessible to the
+       runner. (For example, you can't access a local file if you are running
+       the pipeline on an external cluster.)
+3.  Run the command.
 
 To run the WordCount pipeline, see the Maven and Gradle examples below.
 
-### Run WordCount Using Maven
+### Run WordCount using Maven
 
 For Unix shells:
 
 {{< runner direct >}}
-$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-     -Dexec.args="--inputFile=sample.txt --output=counts" -Pdirect-runner
+mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
+    -Dexec.args="--inputFile=sample.txt --output=counts" -Pdirect-runner
 {{< /runner >}}
-
 {{< runner flink >}}
-$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-     -Dexec.args="--runner=FlinkRunner --inputFile=sample.txt --output=counts" 
-Pflink-runner
+mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
+    -Dexec.args="--runner=FlinkRunner --inputFile=sample.txt --output=counts" 
-Pflink-runner
 {{< /runner >}}
-
 {{< runner flinkCluster >}}
-$ mvn package exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-     -Dexec.args="--runner=FlinkRunner --flinkMaster=<flink master> 
--filesToStage=target/word-count-beam-bundled-0.1.jar \
-                  --inputFile=sample.txt --output=/tmp/counts" -Pflink-runner
-
-You can monitor the running job by visiting the Flink dashboard at 
http://<flink master>:8081

Review comment:
       I can't think of a way to do it -- I'm open to suggestions. The problem 
is that the syntax is copyable by a button in the docs. If we add a comment to 
the runnable command line, and the user copies it, I think it might cause 
problems. For example, if they try to use the up and down keys to go back to 
the last command executed. ...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] pcoet commented on a change in pull request #17033: BEAM-12770: Documented maven-to-gradle conversion for Dataflow

Reply via email to