rszper commented on code in PR #30879:
URL: https://github.com/apache/beam/pull/30879#discussion_r1562791368


##########
contributor-docs/code-change-guide.md:
##########
@@ -0,0 +1,516 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+This guide is for Beam users and developers changing and testing Beam codes.
+Specifically,
+
+1. Testing the code changes locally,
+
+2. Build Beam artifacts with modified Beam code and use it for pipelines.
+
+# Repository structure
+
+The Apache Beam GitHub repository (Beam repo) is pretty much a "mono repo",
+containing everything of the Beam project, from the SDK itself, to test
+infrastructure, dashboards, [Beam website](https://beam.apache.org),
+[Beam Playground](https://play.beam.apache.org), etc.
+
+## Gradle quick start
+
+The Beam repo is a single Gradle project (for all components, including python,
+go, website, etc). It is useful to get familiar with the concept of Gradle 
project structure:
+https://docs.gradle.org/current/userguide/multi_project_builds.html
+
+### Gradle key concept
+
+* project: folder with build.gradle
+* task: defined in build.gradle
+* plugin: run in project build.gradle, pre-defined tasks and hierarchies
+
+For example, common tasks for a java (sub)project:
+
+- compileJava
+- compileTestJava
+- test
+- integrationTest
+
+To run a Gradle task, the command is `./gradlew -p <project path> <task>` or 
equivalently, `./gradlew :project:path:task_name`, e.g.
+
+```
+./gradlew -p sdks/java/core compileJava
+
+./gradlew :sdks:java:harness:test
+```
+
+### Gradle project configuration: Beam specific
+
+* A **huge** plugin 
`buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin` manages 
everything.
+
+Then, for example, in each java (sub)project, `build.gradle` starts with
+
+```groovy
+
+apply plugin: 'org.apache.beam.module'
+
+applyJavaNature( ... )
+```
+
+Relevant usage of BeamModulePlugin:
+* Manage Java dependencies
+* Configuring projects (Java, Python, Go, Proto, Docker, Grpc, Avro, etc)
+  * java -> applyJavaNature; python -> applyPythonNature, etc
+  * Define common custom tasks for each type of projects
+    * test : run Java unit tests
+    * spotlessApply : format java code
+
+## Code paths
+
+Example code paths relevant for SDK development:
+
+* `sdks/java` Java SDK
+  * `sdks/java/core` Java core
+  * `sdks/java/harness` SDK harness (entrypoint of SDK container)
+
+* `runners` runner supports, written in Java. For example,
+  * `runners/direct-java` Java direct runner
+  * `runners/flink-java` Java Flink runner
+  * `runners/google-cloud-dataflow-java` Dataflow runner (job submission, 
translation, etc)
+    * `runners/google-cloud-dataflow-java/`worker Worker on Dataflow legacy 
runner
+
+* `sdks/python` contains setup file and scripts to trigger test-suites
+  * `sdks/python/apache_beam` actual beam package
+    * `sdks/python/apache_beam/runners/worker` SDK worker harness entrypoint, 
state sampler
+    * `sdks/python/apache_beam/io` IO connectors
+    * `sdks/python/apache_beam/transforms` most "core" components
+    * `sdks/python/apache_beam/ml` Beam ML
+    * `sdks/python/apache_beam/runners` runner implementations and wrappers
+    * ...
+
+* `sdks/go` Go SDK
+
+* `.github/workflow` GitHub Action workflows (e.g. tests run under PR). Most
+  workflows just run a single Gradle command. Checking which command running 
for
+  a test so one can run the same command locally during the development.
+
+## Environment setup
+
+Please refer to [Contributing guide](../CONTRIBUTING.md) for setting up local
+development environments first. If intended to use Dataflow, refer to [Google 
cloud 
doc](https://cloud.google.com/dataflow/docs/quickstarts/create-pipeline-java) 
to setup gcloud credentials.
+
+To check if your environment has setup,
+
+In your PATH, it should have
+
+* A Java environment (any supported Java version, Java8 preferably as of 2024).
+  * Needed for all development as Beam is a Gradle project (which uses JVM)
+  * Recommended: Use [sdkman]((https://sdkman.io/install)) to manage Java 
versions
+* A Python environment (any supported Python version)
+  * Needed for Python SDK development
+  * Recommended: Use [pyenv](https://github.com/pyenv/pyenv) and
+    [virtual environment](https://docs.python.org/3/library/venv.html) to 
manage Python versions
+* A Go environment. Install the latest Go
+  * Needed for Go SDK development and SDK container change (for all SDKs), as
+  the container entrypoint scripts are written in Go.
+* A docker environment.
+  * Needed for SDK container change, some cross-language functionality (if run 
a
+    SDK container image), portable runners (using job server), etc
+
+For example
+- testing the code change in `sdks/java/io/google-cloud-platform`: need a Java 
environment
+- testing the code change in `sdks/java/harness`: need a Java environment, a Go
+  environment and Docker environment (to compile and build Java SDK harness 
container image)
+- testing the code change in `sdks/python/apache_beam`: need a Python 
environment
+
+# Java Guide
+
+## IDE (IntelliJ) Setup
+
+1. From IntelliJ, open `/beam` (**Important** repository root dir, instead of
+  `sdks/java`, etc)
+
+2. Wait for indexing (takes minutes)
+
+It should just work (if prerequisites met) as Gradle is a self-contained build 
tool
+
+To check the load is successful, find `examples/java/build.gradle`, there 
should
+be a "Run" button besides wordCount task. Click the button, it should compile
+and run the wordCount example.
+
+<img width="631" alt="image" 
src="https://github.com/apache/beam/assets/8010435/f5203e8e-0f9c-4eaa-895b-e16f68a808a2";>
+
+**Note** IDE is not required for changing the code and testing. Again, as a
+Gradle project, tests can be executed via a Gradle command line, see below.
+
+## Console (shell) setup
+
+Equivalent command line:
+
+```shell
+$ cd beam
+$ ./gradlew :examples:java:wordCount
+```
+
+Upon finishing, one should see the following Gradle build log:
+
+```
+...
+BUILD SUCCESSFUL in 2m 32s
+96 actionable tasks: 9 executed, 87 up-to-date
+3:41:06 PM: Execution finished 'wordCount'.
+```
+
+and checking the output file:
+
+```shell
+
+$ head /tmp/output.txt*
+==> /tmp/output.txt-00000-of-00003 <==
+should: 38
+bites: 1
+depraved: 1
+gauntlet: 1
+battle: 6
+sith: 2
+cools: 1
+natures: 1
+hedge: 1
+words: 9
+
+==> /tmp/output.txt-00001-of-00003 <==
+elements: 1
+Advise: 2
+fearful: 2
+towards: 4
+ready: 8
+pared: 1
+left: 8
+safe: 4
+canst: 7
+warrant: 2
+
+==> /tmp/output.txt-00002-of-00003 <==
+chanced: 1
+...
+```
+
+*What does this command do?*
+
+It compiles the beam SDK and the word count pipeline (Hello-world program for
+data processing), then run the pipeline on Direct Runner.
+
+## Run a unit test
+
+Now, suppose you have made a code change in Java SDK (e.g. in 
`sdks/java/io/jdbc`),
+and want to run relevant unit tests locally to verify the change.
+
+Unit tests are under `src/test/java` folder of each project with filename 
`.../**Test.java` for unit tests and, `.../**IT.java` for integration tests. 
For example,
+
+* Run all unit tests under a project
+  ```
+  ./gradlew :sdks:java:harness:test
+  ```
+  Then JUnit report (in html web page) can be found under 
`<invoked_project>/build/reports/tests/test/index.html`
+
+* Run a specific test

Review Comment:
   ```suggestion
   * To run a specific test, use the following commands:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to