lostluck commented on code in PR #27415:
URL: https://github.com/apache/beam/pull/27415#discussion_r1260286538
##########
sdks/go/README.md:
##########
@@ -17,26 +17,87 @@
under the License.
-->
-# Go SDK
+# Go SDK Overview
The Apache Beam Go SDK is the Beam Model implemented in the [Go Programming
Language](https://go.dev/).
It is based on the following initial
[design](https://s.apache.org/beam-go-sdk-design-rfc).
+Below describes requirements, how to run examples, execute tests, and
contribute to the Go SDK.
-## How to run the examples
+_A note on Beam specific terminology used in this README._
-**Prerequisites**: to use Google Cloud sources and sinks (default for
-most examples), follow the setup
-[here](https://beam.apache.org/documentation/runners/dataflow/). You can
-verify that it works by running the corresponding Java example.
+_This README uses minimally necessary Beam related terminology to help you
determine requirements, usage and
+contribution to the Go SDK. A [definitions](#definitions) section below
provides short definitions for you to achieve
+these aims._
-The examples are normal Go programs and are most easily run directly.
-They are parameterized by Go flags.
-For example, to run wordcount on the Go direct runner do:
+# Requirements
+Aside from the obvious [go](https:/go.dev), you will need to clone the
Review Comment:
Nit: Phrasing. There's no need to call out something as obvious. Simply
state the obvious requirement.
```suggestion
An up to date version of Go installed. See https://go.dev/doc/install for
instruction.
Experience using the Go programming language is strongly recommended. The
best place to learn how to program Go is from the Go learning resources.
Completing the tutorials https://go.dev/doc/tutorial/getting-started and
https://go.dev/doc/tutorial/create-module is a great way to start.
```
Side nit: The URL was missing a /.
##########
sdks/go/README.md:
##########
@@ -17,26 +17,87 @@
under the License.
-->
-# Go SDK
+# Go SDK Overview
The Apache Beam Go SDK is the Beam Model implemented in the [Go Programming
Language](https://go.dev/).
It is based on the following initial
[design](https://s.apache.org/beam-go-sdk-design-rfc).
+Below describes requirements, how to run examples, execute tests, and
contribute to the Go SDK.
-## How to run the examples
+_A note on Beam specific terminology used in this README._
-**Prerequisites**: to use Google Cloud sources and sinks (default for
-most examples), follow the setup
-[here](https://beam.apache.org/documentation/runners/dataflow/). You can
-verify that it works by running the corresponding Java example.
+_This README uses minimally necessary Beam related terminology to help you
determine requirements, usage and
+contribution to the Go SDK. A [definitions](#definitions) section below
provides short definitions for you to achieve
+these aims._
-The examples are normal Go programs and are most easily run directly.
-They are parameterized by Go flags.
-For example, to run wordcount on the Go direct runner do:
+# Requirements
+Aside from the obvious [go](https:/go.dev), you will need to clone the
+[Beam Repository](https://github.com/apache/beam) on your local machine.
+
+To keep terminal commands clear in this README, the following is assumed:
+
+```
+export BEAM_ROOT=path/to/where/you/clone/beam/repository
+```
+
+Only required to run examples, execute tests, and contribute to the Go SDK,
[git clone](https://git-scm.com/docs/git-clone)
+the Beam repository:
+
+```sh
+git clone https://github.com/apache/beam.git $BEAM_ROOT
```
-$ pwd
-[...]/sdks/go
-$ go run examples/wordcount/wordcount.go --output=/tmp/result.txt
+
+or if you
[fork](https://docs.github.com/en/get-started/quickstart/fork-a-repo) the Beam
repository into your GitHub
+account with `<username>`.
+
+```sh
+git clone [email protected]:<username>/beam.git $BEAM_ROOT
+```
+
+
+# Run Examples
+
+## Additional Requirements
+
+In addition to the [common requirements](#requirements) listed above, the
following lists anything additional for
+running most of the [examples in this repository](examples).
+
+### Google Cloud Setup
+
+Most examples require Google Cloud related resources that serve as data
[sources](#source) and [sinks](#sink).
+Follow prerequisites listed as setup in
https://beam.apache.org/documentation/runners/dataflow for the Java SDK
+(It's optional to run the Java example for validation in that referenced
documentation but not required).
+
+## Usage
Review Comment:
An SDK developer shouldn't be informed how to run the examples. The examples
are there for users, so it's appropriate to refer to that documentation instead.
The Go SDK quickstart demonstrates running examples:
https://beam.apache.org/get-started/quickstart-go/
I have a task to demonstrate the SDK better for users (because the current
quickstart is a bit too quick...)
https://github.com/apache/beam/issues/27300
But someone who works with Go already knows they should be in the Beam Go
SDK's go.mod directory. We shouldn't be repeating Go best practices, we should
provide the necessary knowledge (where the go.mod file is) but rely on that the
users know Go.
You'll notice the bent here is that someone shouldn't be trying to
contribute to Beam to learn Go from scratch. We can't teach potential
contributors Go. It doesn't scale. (We can help fix and improve their PRs, but
we shouldn't be providing documentation that will become stale or be very
environment dependant.)
##########
sdks/go/README.md:
##########
@@ -84,56 +145,237 @@ sentence: 1
purse: 6
```
-To run wordcount on dataflow runner do:
+#### Run Example on the Dataflow Runner
+
+To run [examples/wordcount](examples/wordcount) on the [Dataflow
Runner](#dataflow-runner) run the following. See
+[pkg/beam/runners/dataflow/dataflow.go](pkg/beam/runners/dataflow/dataflow.go)
and
+[examples/wordcount/wordcount.go](examples/wordcount/wordcount.go) for a
descriptions of required and optional flags.
+
+1. Set your
+[Google Cloud
project](https://cloud.google.com/resource-manager/docs/cloud-platform-resource-hierarchy#projects):
+ ```
+ GCP_PROJECT=$(gcloud config get-value project)
+ ```
+
+2. Set your [Google Cloud Compute
region](https://cloud.google.com/compute/docs/regions-zones):
+ ```
+ GCP_REGION=us-central1
+ ```
+
+3. Create and set your [Google Cloud Storage
Bucket](https://cloud.google.com/storage/docs/buckets):
+ (_Note this is WITHOUT the `gs://`_)
+ ```
+ GCS_BUCKET=<your-google-cloud-storage-bucket-name>
+ ```
+
+4. Run the word count example
+ (see
+ [Google Cloud
Documentation](https://cloud.google.com/dataflow/docs/quickstarts/create-pipeline-go#run_the_pipeline_on_the_service)
+ for more details):
+ ```
+ go run go/examples/wordcount/wordcount.go --runner=dataflow \
+ --sdk_container_image=apache/beam_go_sdk:latest \
+ --project=$GCP_PROJECT \
+ --region=$GCP_REGION \
+ --staging_location=gs://$GCS_BUCKET/staging \
+ --output=gs://$GCS_BUCKET/output
+ ```
+
+ You should see:
+ ```
+ 2023/07/09 10:57:13 Submitted job: <job-id>
+ 2023/07/09 10:57:13 Console:
https://console.cloud.google.com/dataflow/jobs/us-central1/<job-id>?project=<project>
+ 2023/07/09 11:02:56 Job state: JOB_STATE_PENDING ...
+ 2023/07/09 11:03:26 Job still running ...
+ ```
+
+5. After seeing `Job <job-id> succeeded!` you can inspect the resulting output.
+
+ Run the following command to inspect the resulting output.
+
+ ```
+ gcloud storage cat "gs://$GCS_BUCKET/output*" | head
+ ```
+
+ You should see something similar to the following:
+ ```
+ feature: 1
+ block: 1
+ Cried: 1
+ scatter'd: 1
+ she: 44
+ sudden: 1
+ silly: 1
+ More: 6
+ out: 68
+ believe: 3
+ ```
+
+#### Troubleshooting tips
+
+If you get the following error:
+```
+googleapi: Error 400: User project specified in the request is invalid.
+```
+
+Try the following:
+1. Make sure you followed the [Google Cloud Setup](#google-cloud-setup) above.
+2. Configure the [gcloud](https://cloud.google.com/sdk/docs/install-sdk) with
your project:
+ ```
+ gcloud config set project <your-project>
+ ```
+3. Consider re-running BOTH `gcloud auth login` and `gcloud auth
application-default login`
+ commands.
+
+## Testing
+
+### Requirements (For runner validations)
+
+Below lists additional requirements to execute tests in this repository on
your local machine.
+
+#### 1. Java and Python
+
+You **do not** need to know or care about Java or Python to use the Go SDK for
your data processing goals.
+
+Java is **only** required to execute any [gradle](https://gradle.org/)
commands configured in the
+[Beam](https://github.com/apache/beam) repository. It will be obvious whether
you will execute gradle commands in
+sections below. You **do not** need to install [gradle](https://gradle.org/)
and simply use the enclosed
+`$BEAM_ROOT/gradlew` executable available at the root of the
[Beam](https://github.com/apache/beam) repository.
+
+Python is **only** required if you need to validate tests against the
+[Portable Python Runner](#portable-python-runner). If you are not testing
against this
+runner, ignore the Python requirement.
+
+#### 2. Docker
+
+[Docker](https://www.docker.com/) is required for certain but not all runners
[See definition](#runner).
+
+As of this writing, [colima](https://github.com/abiosoft/colima), a preferred
docker alternative for some developers,
+did not work.
Review Comment:
+1 It's not our job to assume we can fix arbitrary user set ups, unless
we're trying to support that specifically. We don't test this, so we can't
guarantee it.
##########
sdks/go/README.md:
##########
@@ -17,26 +17,87 @@
under the License.
-->
-# Go SDK
+# Go SDK Overview
The Apache Beam Go SDK is the Beam Model implemented in the [Go Programming
Language](https://go.dev/).
It is based on the following initial
[design](https://s.apache.org/beam-go-sdk-design-rfc).
+Below describes requirements, how to run examples, execute tests, and
contribute to the Go SDK.
-## How to run the examples
+_A note on Beam specific terminology used in this README._
-**Prerequisites**: to use Google Cloud sources and sinks (default for
-most examples), follow the setup
-[here](https://beam.apache.org/documentation/runners/dataflow/). You can
-verify that it works by running the corresponding Java example.
+_This README uses minimally necessary Beam related terminology to help you
determine requirements, usage and
+contribution to the Go SDK. A [definitions](#definitions) section below
provides short definitions for you to achieve
+these aims._
Review Comment:
+1. We shouldn't be providing dedicated resources for SDK
authors/contributors outside of an SDK Authoring Guide (which we don't have ...
yet). Contributors to specific SDKs should come in from a user's perspective,
and we shouldn't replicate things on a per SDK basis. That is, it's
innappropriate to have the common definitions here in Go SDK specific places
unless they are unique to the Go SDK.
##########
sdks/go/README.md:
##########
@@ -84,56 +145,237 @@ sentence: 1
purse: 6
```
-To run wordcount on dataflow runner do:
+#### Run Example on the Dataflow Runner
Review Comment:
This should definitely only belong in the Go Quickstart or similar
tutorial(s). It's not appropriate for this location.
This is good content, this is just not the place for it.
##########
sdks/go/README.md:
##########
@@ -84,56 +145,237 @@ sentence: 1
purse: 6
```
-To run wordcount on dataflow runner do:
+#### Run Example on the Dataflow Runner
+
+To run [examples/wordcount](examples/wordcount) on the [Dataflow
Runner](#dataflow-runner) run the following. See
+[pkg/beam/runners/dataflow/dataflow.go](pkg/beam/runners/dataflow/dataflow.go)
and
+[examples/wordcount/wordcount.go](examples/wordcount/wordcount.go) for a
descriptions of required and optional flags.
+
+1. Set your
+[Google Cloud
project](https://cloud.google.com/resource-manager/docs/cloud-platform-resource-hierarchy#projects):
+ ```
+ GCP_PROJECT=$(gcloud config get-value project)
+ ```
+
+2. Set your [Google Cloud Compute
region](https://cloud.google.com/compute/docs/regions-zones):
+ ```
+ GCP_REGION=us-central1
+ ```
+
+3. Create and set your [Google Cloud Storage
Bucket](https://cloud.google.com/storage/docs/buckets):
+ (_Note this is WITHOUT the `gs://`_)
+ ```
+ GCS_BUCKET=<your-google-cloud-storage-bucket-name>
+ ```
+
+4. Run the word count example
+ (see
+ [Google Cloud
Documentation](https://cloud.google.com/dataflow/docs/quickstarts/create-pipeline-go#run_the_pipeline_on_the_service)
+ for more details):
+ ```
+ go run go/examples/wordcount/wordcount.go --runner=dataflow \
+ --sdk_container_image=apache/beam_go_sdk:latest \
+ --project=$GCP_PROJECT \
+ --region=$GCP_REGION \
+ --staging_location=gs://$GCS_BUCKET/staging \
+ --output=gs://$GCS_BUCKET/output
+ ```
+
+ You should see:
+ ```
+ 2023/07/09 10:57:13 Submitted job: <job-id>
+ 2023/07/09 10:57:13 Console:
https://console.cloud.google.com/dataflow/jobs/us-central1/<job-id>?project=<project>
+ 2023/07/09 11:02:56 Job state: JOB_STATE_PENDING ...
+ 2023/07/09 11:03:26 Job still running ...
+ ```
+
+5. After seeing `Job <job-id> succeeded!` you can inspect the resulting output.
+
+ Run the following command to inspect the resulting output.
+
+ ```
+ gcloud storage cat "gs://$GCS_BUCKET/output*" | head
+ ```
+
+ You should see something similar to the following:
+ ```
+ feature: 1
+ block: 1
+ Cried: 1
+ scatter'd: 1
+ she: 44
+ sudden: 1
+ silly: 1
+ More: 6
+ out: 68
+ believe: 3
+ ```
+
+#### Troubleshooting tips
+
+If you get the following error:
+```
+googleapi: Error 400: User project specified in the request is invalid.
+```
+
+Try the following:
+1. Make sure you followed the [Google Cloud Setup](#google-cloud-setup) above.
+2. Configure the [gcloud](https://cloud.google.com/sdk/docs/install-sdk) with
your project:
+ ```
+ gcloud config set project <your-project>
+ ```
+3. Consider re-running BOTH `gcloud auth login` and `gcloud auth
application-default login`
+ commands.
+
+## Testing
+
+### Requirements (For runner validations)
+
+Below lists additional requirements to execute tests in this repository on
your local machine.
+
+#### 1. Java and Python
+
+You **do not** need to know or care about Java or Python to use the Go SDK for
your data processing goals.
+
+Java is **only** required to execute any [gradle](https://gradle.org/)
commands configured in the
+[Beam](https://github.com/apache/beam) repository. It will be obvious whether
you will execute gradle commands in
+sections below. You **do not** need to install [gradle](https://gradle.org/)
and simply use the enclosed
+`$BEAM_ROOT/gradlew` executable available at the root of the
[Beam](https://github.com/apache/beam) repository.
+
+Python is **only** required if you need to validate tests against the
+[Portable Python Runner](#portable-python-runner). If you are not testing
against this
+runner, ignore the Python requirement.
+
+#### 2. Docker
+
+[Docker](https://www.docker.com/) is required for certain but not all runners
[See definition](#runner).
+
+As of this writing, [colima](https://github.com/abiosoft/colima), a preferred
docker alternative for some developers,
+did not work.
+
+#### 3. Flock
+
+[sdks/go/run_with_go_version.sh](run_with_go_version.sh) requires the use of
Review Comment:
This isn't necessary to mention. The gradle commands are largely for the
github actions or the java developers. The reason flock exists is to avoid race
conditions on installing Go versions, and once prism is the default, users
won't need to run these scripts locally, they should use Go test.
##########
sdks/go/README.md:
##########
@@ -17,26 +17,87 @@
under the License.
-->
-# Go SDK
+# Go SDK Overview
The Apache Beam Go SDK is the Beam Model implemented in the [Go Programming
Language](https://go.dev/).
It is based on the following initial
[design](https://s.apache.org/beam-go-sdk-design-rfc).
+Below describes requirements, how to run examples, execute tests, and
contribute to the Go SDK.
-## How to run the examples
+_A note on Beam specific terminology used in this README._
-**Prerequisites**: to use Google Cloud sources and sinks (default for
-most examples), follow the setup
-[here](https://beam.apache.org/documentation/runners/dataflow/). You can
-verify that it works by running the corresponding Java example.
+_This README uses minimally necessary Beam related terminology to help you
determine requirements, usage and
+contribution to the Go SDK. A [definitions](#definitions) section below
provides short definitions for you to achieve
+these aims._
-The examples are normal Go programs and are most easily run directly.
-They are parameterized by Go flags.
-For example, to run wordcount on the Go direct runner do:
+# Requirements
+Aside from the obvious [go](https:/go.dev), you will need to clone the
+[Beam Repository](https://github.com/apache/beam) on your local machine.
+
+To keep terminal commands clear in this README, the following is assumed:
+
+```
+export BEAM_ROOT=path/to/where/you/clone/beam/repository
+```
+
+Only required to run examples, execute tests, and contribute to the Go SDK,
[git clone](https://git-scm.com/docs/git-clone)
+the Beam repository:
+
+```sh
+git clone https://github.com/apache/beam.git $BEAM_ROOT
Review Comment:
Let's not rehash what is already documented on the contribution guide. We
shouldn't explain git/github here.
https://beam.apache.org/contribute/get-started-contributing/
##########
sdks/go/README.md:
##########
@@ -84,56 +145,237 @@ sentence: 1
purse: 6
```
-To run wordcount on dataflow runner do:
+#### Run Example on the Dataflow Runner
+
+To run [examples/wordcount](examples/wordcount) on the [Dataflow
Runner](#dataflow-runner) run the following. See
+[pkg/beam/runners/dataflow/dataflow.go](pkg/beam/runners/dataflow/dataflow.go)
and
+[examples/wordcount/wordcount.go](examples/wordcount/wordcount.go) for a
descriptions of required and optional flags.
+
+1. Set your
+[Google Cloud
project](https://cloud.google.com/resource-manager/docs/cloud-platform-resource-hierarchy#projects):
+ ```
+ GCP_PROJECT=$(gcloud config get-value project)
+ ```
+
+2. Set your [Google Cloud Compute
region](https://cloud.google.com/compute/docs/regions-zones):
+ ```
+ GCP_REGION=us-central1
+ ```
+
+3. Create and set your [Google Cloud Storage
Bucket](https://cloud.google.com/storage/docs/buckets):
+ (_Note this is WITHOUT the `gs://`_)
+ ```
+ GCS_BUCKET=<your-google-cloud-storage-bucket-name>
+ ```
+
+4. Run the word count example
+ (see
+ [Google Cloud
Documentation](https://cloud.google.com/dataflow/docs/quickstarts/create-pipeline-go#run_the_pipeline_on_the_service)
+ for more details):
+ ```
+ go run go/examples/wordcount/wordcount.go --runner=dataflow \
+ --sdk_container_image=apache/beam_go_sdk:latest \
+ --project=$GCP_PROJECT \
+ --region=$GCP_REGION \
+ --staging_location=gs://$GCS_BUCKET/staging \
+ --output=gs://$GCS_BUCKET/output
+ ```
+
+ You should see:
+ ```
+ 2023/07/09 10:57:13 Submitted job: <job-id>
+ 2023/07/09 10:57:13 Console:
https://console.cloud.google.com/dataflow/jobs/us-central1/<job-id>?project=<project>
+ 2023/07/09 11:02:56 Job state: JOB_STATE_PENDING ...
+ 2023/07/09 11:03:26 Job still running ...
+ ```
+
+5. After seeing `Job <job-id> succeeded!` you can inspect the resulting output.
+
+ Run the following command to inspect the resulting output.
+
+ ```
+ gcloud storage cat "gs://$GCS_BUCKET/output*" | head
+ ```
+
+ You should see something similar to the following:
+ ```
+ feature: 1
+ block: 1
+ Cried: 1
+ scatter'd: 1
+ she: 44
+ sudden: 1
+ silly: 1
+ More: 6
+ out: 68
+ believe: 3
+ ```
+
+#### Troubleshooting tips
+
+If you get the following error:
+```
+googleapi: Error 400: User project specified in the request is invalid.
+```
+
+Try the following:
+1. Make sure you followed the [Google Cloud Setup](#google-cloud-setup) above.
+2. Configure the [gcloud](https://cloud.google.com/sdk/docs/install-sdk) with
your project:
+ ```
+ gcloud config set project <your-project>
+ ```
+3. Consider re-running BOTH `gcloud auth login` and `gcloud auth
application-default login`
+ commands.
+
+## Testing
+
+### Requirements (For runner validations)
+
+Below lists additional requirements to execute tests in this repository on
your local machine.
+
+#### 1. Java and Python
+
+You **do not** need to know or care about Java or Python to use the Go SDK for
your data processing goals.
+
+Java is **only** required to execute any [gradle](https://gradle.org/)
commands configured in the
+[Beam](https://github.com/apache/beam) repository. It will be obvious whether
you will execute gradle commands in
+sections below. You **do not** need to install [gradle](https://gradle.org/)
and simply use the enclosed
+`$BEAM_ROOT/gradlew` executable available at the root of the
[Beam](https://github.com/apache/beam) repository.
+
+Python is **only** required if you need to validate tests against the
+[Portable Python Runner](#portable-python-runner). If you are not testing
against this
+runner, ignore the Python requirement.
+
+#### 2. Docker
+
+[Docker](https://www.docker.com/) is required for certain but not all runners
[See definition](#runner).
+
+As of this writing, [colima](https://github.com/abiosoft/colima), a preferred
docker alternative for some developers,
+did not work.
+
+#### 3. Flock
+
+[sdks/go/run_with_go_version.sh](run_with_go_version.sh) requires the use of
+[flock](https://github.com/discoteq/flock).
+
+### Execution
+
+#### 1. Navigate to the go.mod directory
+
+Open a terminal and navigate into the go.mod containing directory of the Beam
repository. (See above for what
+`$BEAM_ROOT` means). Notice that you are entering the `$BEAM_ROOT/sdks` and
not `$BEAM_ROOT/sdks/go`.
+
+```sh
+cd $BEAM_ROOT/sdks
+```
+
+#### 2. Run Go test
+
+Run go test as you would any Go project.
+For unit tests in the exported `pkg/beam` package:
```
-$ go run wordcount.go --runner=dataflow --project=<YOUR_GCP_PROJECT>
--region=<YOUR_GCP_REGION> --staging_location=<YOUR_GCS_LOCATION>/staging
--worker_harness_container_image=<YOUR_SDK_HARNESS_IMAGE_LOCATION>
--output=<YOUR_GCS_LOCATION>/output
+go test ./go/pkg/beam...
```
-The output is a GCS file in this case:
+For integration, load, and regression tests:
+```
+go test ./go/test/...
+```
+
+### Runner validations
Review Comment:
This section above should be either refering to the cwiki page on adding
integration tests to the SDK, or replicating that content. Contributors should
be aware that gradle commands will run those test suites on the various
runners.
TBH what we have here feels like something that should be mentioned in a
language agnostic matter in the Beam Contribution Guide, and this section
should refer to gradle commands that execute the validation on each runner. We
should not try to justtify the beam infra architectural choices here. We need
to assert that they exist, and how contributors use them.
##########
sdks/go/README.md:
##########
@@ -84,56 +145,237 @@ sentence: 1
purse: 6
```
-To run wordcount on dataflow runner do:
+#### Run Example on the Dataflow Runner
+
+To run [examples/wordcount](examples/wordcount) on the [Dataflow
Runner](#dataflow-runner) run the following. See
+[pkg/beam/runners/dataflow/dataflow.go](pkg/beam/runners/dataflow/dataflow.go)
and
+[examples/wordcount/wordcount.go](examples/wordcount/wordcount.go) for a
descriptions of required and optional flags.
+
+1. Set your
+[Google Cloud
project](https://cloud.google.com/resource-manager/docs/cloud-platform-resource-hierarchy#projects):
+ ```
+ GCP_PROJECT=$(gcloud config get-value project)
+ ```
+
+2. Set your [Google Cloud Compute
region](https://cloud.google.com/compute/docs/regions-zones):
+ ```
+ GCP_REGION=us-central1
+ ```
+
+3. Create and set your [Google Cloud Storage
Bucket](https://cloud.google.com/storage/docs/buckets):
+ (_Note this is WITHOUT the `gs://`_)
+ ```
+ GCS_BUCKET=<your-google-cloud-storage-bucket-name>
+ ```
+
+4. Run the word count example
+ (see
+ [Google Cloud
Documentation](https://cloud.google.com/dataflow/docs/quickstarts/create-pipeline-go#run_the_pipeline_on_the_service)
+ for more details):
+ ```
+ go run go/examples/wordcount/wordcount.go --runner=dataflow \
+ --sdk_container_image=apache/beam_go_sdk:latest \
+ --project=$GCP_PROJECT \
+ --region=$GCP_REGION \
+ --staging_location=gs://$GCS_BUCKET/staging \
+ --output=gs://$GCS_BUCKET/output
+ ```
+
+ You should see:
+ ```
+ 2023/07/09 10:57:13 Submitted job: <job-id>
+ 2023/07/09 10:57:13 Console:
https://console.cloud.google.com/dataflow/jobs/us-central1/<job-id>?project=<project>
+ 2023/07/09 11:02:56 Job state: JOB_STATE_PENDING ...
+ 2023/07/09 11:03:26 Job still running ...
+ ```
+
+5. After seeing `Job <job-id> succeeded!` you can inspect the resulting output.
+
+ Run the following command to inspect the resulting output.
+
+ ```
+ gcloud storage cat "gs://$GCS_BUCKET/output*" | head
+ ```
+
+ You should see something similar to the following:
+ ```
+ feature: 1
+ block: 1
+ Cried: 1
+ scatter'd: 1
+ she: 44
+ sudden: 1
+ silly: 1
+ More: 6
+ out: 68
+ believe: 3
+ ```
+
+#### Troubleshooting tips
+
+If you get the following error:
+```
+googleapi: Error 400: User project specified in the request is invalid.
+```
+
+Try the following:
+1. Make sure you followed the [Google Cloud Setup](#google-cloud-setup) above.
+2. Configure the [gcloud](https://cloud.google.com/sdk/docs/install-sdk) with
your project:
+ ```
+ gcloud config set project <your-project>
+ ```
+3. Consider re-running BOTH `gcloud auth login` and `gcloud auth
application-default login`
+ commands.
+
+## Testing
+
+### Requirements (For runner validations)
+
+Below lists additional requirements to execute tests in this repository on
your local machine.
+
+#### 1. Java and Python
+
+You **do not** need to know or care about Java or Python to use the Go SDK for
your data processing goals.
+
+Java is **only** required to execute any [gradle](https://gradle.org/)
commands configured in the
+[Beam](https://github.com/apache/beam) repository. It will be obvious whether
you will execute gradle commands in
+sections below. You **do not** need to install [gradle](https://gradle.org/)
and simply use the enclosed
+`$BEAM_ROOT/gradlew` executable available at the root of the
[Beam](https://github.com/apache/beam) repository.
+
+Python is **only** required if you need to validate tests against the
+[Portable Python Runner](#portable-python-runner). If you are not testing
against this
+runner, ignore the Python requirement.
+
+#### 2. Docker
+
+[Docker](https://www.docker.com/) is required for certain but not all runners
[See definition](#runner).
+
+As of this writing, [colima](https://github.com/abiosoft/colima), a preferred
docker alternative for some developers,
+did not work.
+
+#### 3. Flock
+
+[sdks/go/run_with_go_version.sh](run_with_go_version.sh) requires the use of
+[flock](https://github.com/discoteq/flock).
+
+### Execution
+
+#### 1. Navigate to the go.mod directory
+
+Open a terminal and navigate into the go.mod containing directory of the Beam
repository. (See above for what
+`$BEAM_ROOT` means). Notice that you are entering the `$BEAM_ROOT/sdks` and
not `$BEAM_ROOT/sdks/go`.
+
+```sh
+cd $BEAM_ROOT/sdks
+```
+
+#### 2. Run Go test
+
+Run go test as you would any Go project.
+For unit tests in the exported `pkg/beam` package:
```
-$ go run wordcount.go --runner=dataflow --project=<YOUR_GCP_PROJECT>
--region=<YOUR_GCP_REGION> --staging_location=<YOUR_GCS_LOCATION>/staging
--worker_harness_container_image=<YOUR_SDK_HARNESS_IMAGE_LOCATION>
--output=<YOUR_GCS_LOCATION>/output
+go test ./go/pkg/beam...
```
-The output is a GCS file in this case:
+For integration, load, and regression tests:
+```
+go test ./go/test/...
+```
+
+### Runner validations
+
+The following documents various [Runner](#runner) validation tests related to
test execution of the Go SDK
+**in this repository** (in contrast to your own Go SDK dependent repository
and projects).
+You'll see some documentation that deviates from Go test execution convention
to run gradle commands.
Review Comment:
I'm -1 on this suggestion since it's another place I'll need to change in 3
weeks when I replace the Go direct runner with Prism.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]