dariabezkorovaina commented on code in PR #30499:
URL: https://github.com/apache/beam/pull/30499#discussion_r1516106352
##########
learning/prompts/documentation-lookup-nolinks/48_sdk_harness.md:
##########
@@ -0,0 +1,19 @@
+Prompt:
+What is SDK Harness in Apache Beam?
Review Comment:
```suggestion
What is SDK harness in Apache Beam?
```
##########
learning/prompts/documentation-lookup-nolinks/48_sdk_harness.md:
##########
@@ -0,0 +1,19 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of
Apache Beam. The portability framework introduces well-defined,
language-neutral data structures and protocols between the SDK and runner. This
interop layer – called the `Portability API` – ensures that SDKs and runners
can work with each other uniformly, reducing the interoperability burden for
both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs
and runners for job submission, management and execution. These contracts use
`protobuf`s and `gRPC` for broad language support. All SDKs currently support
the portability framework.
+
+The SDK harness is a SDK-provided program responsible for executing user code
and is run separately from the runner. SDK harness initialization relies on the
Provision and `Artifact API`s for obtaining staged files, pipeline options and
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying
cluster setups:
+
+* **environment_type**: determines where user code will be executed:
+ * **DOCKER**: User code is executed within a container started on each
worker node. This requires docker to be installed on worker nodes (default).
Use `environment_config` to specify the Docker image URL. Official Docker
images are used by default. Alternatively, you can build your own image.
Prebuilt SDK container images are released per supported language during Beam
releases and pushed to Docker Hub.
Review Comment:
```suggestion
* `DOCKER`: executes user code within a container on each worker node.
Docker must be installed on worker nodes. You can specify the Docker image URL
using the `environment_config` parameter. Prebuilt SDK container images are
available with each Apache Beam release and pushed to Docker Hub. You can also
build your custom image.
```
##########
learning/prompts/documentation-lookup-nolinks/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is Docker Compose service included into Apache Beam
SDK versions 2.49.0 and later. You can use Transform service to upgrade or
downgrade Beam SDK version of individual supported transforms in your pipeline
without changing the Beam version of the pipeline. This feature is only
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline
options specifying the URNs of the transforms you want to upgrade and the Beam
version you want to upgrade to:
Review Comment:
```suggestion
You can utilize the Transform service to upgrade specific transforms only if
you are using Beam Java SDK 2.53.0 and later. To employ this feature, execute a
Java pipeline with additional pipeline options specifying the URNs of the
transforms you want to upgrade and the desired Apache Beam version:
```
##########
learning/prompts/documentation-lookup-nolinks/48_sdk_harness.md:
##########
@@ -0,0 +1,19 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of
Apache Beam. The portability framework introduces well-defined,
language-neutral data structures and protocols between the SDK and runner. This
interop layer – called the `Portability API` – ensures that SDKs and runners
can work with each other uniformly, reducing the interoperability burden for
both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs
and runners for job submission, management and execution. These contracts use
`protobuf`s and `gRPC` for broad language support. All SDKs currently support
the portability framework.
+
+The SDK harness is a SDK-provided program responsible for executing user code
and is run separately from the runner. SDK harness initialization relies on the
Provision and `Artifact API`s for obtaining staged files, pipeline options and
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying
cluster setups:
Review Comment:
```suggestion
Apache Beam offers configuration options for the SDK harness to cater to
diverse cluster setups. These options include:
```
##########
learning/prompts/documentation-lookup/48_sdk_harness.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+What is SDK Harness in Apache Beam?
Review Comment:
```suggestion
What is SDK harness in Apache Beam?
```
##########
learning/prompts/documentation-lookup-nolinks/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is Docker Compose service included into Apache Beam
SDK versions 2.49.0 and later. You can use Transform service to upgrade or
downgrade Beam SDK version of individual supported transforms in your pipeline
without changing the Beam version of the pipeline. This feature is only
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline
options specifying the URNs of the transforms you want to upgrade and the Beam
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion`
specifies the Beam version you want to upgrade to.
+
+The framework will automatically download the specified version of Docker
containers for the transforms and use them in the pipeline. You must have
Docker installed on the machine that starts the service.
+
+Currently the following transforms are supported:
+
+* BigQuery read transform: `beam:transform:org.apache.beam:bigquery_read:v1`
+* BigQuery write transform: `beam:transform:org.apache.beam:bigquery_write:v1`
+* Kafka read transform:
`beam:transform:org.apache.beam:kafka_read_with_metadata:v2`
+* Kafka write transform: `beam:transform:org.apache.beam:kafka_write:v2`
+
+Transform service implement Beam expansion API. This means you can use the
Transform service to construct and execute multi-language pipelines. For
example, you can build a Python pipeline that uses Java `KafkaIO` transform and
execute in without installing Java locally.
Review Comment:
```suggestion
The Transform service implements the Beam expansion API, enabling
multi-language pipelines to leverage it for expanding supported transforms.
This feature allows you to create and run multi-language pipelines without
additional language runtimes. For instance, you can build a Python pipeline
that utilizes a Java `KafkaIO` transform without the need to install Java
locally.
```
##########
learning/prompts/documentation-lookup/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is [Docker Compose
service](https://docs.docker.com/compose/) included into Apache Beam SDK
versions 2.49.0 and later. You can use Transform service to upgrade or
downgrade Beam SDK version of individual supported transforms in your pipeline
without changing the Beam version of the pipeline. This feature is only
supported for Beam Java SDK 2.53.0 and later.
Review Comment:
```suggestion
The Transform service is a [Docker
Compose](https://docs.docker.com/compose/) service included in Apache Beam SDK
versions 2.49.0 and later. It enables you to upgrade or downgrade the Beam SDK
version of individual supported transforms in your pipeline without changing
the overall Beam version of the pipeline. Additionally, you can utilize the
Transform service to create and execute multi-language pipelines without
needing to install support for additional language runtimes.
```
##########
learning/prompts/documentation-lookup/48_sdk_harness.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of
Apache Beam. The [portability
framework](https://beam.apache.org/roadmap/portability/) introduces
well-defined, language-neutral data structures and protocols between the SDK
and runner. This interop layer – called the `Portability API` – ensures that
SDKs and runners can work with each other uniformly, reducing the
interoperability burden for both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs
and runners for job submission, management and execution. These contracts use
`protobuf`s and `gRPC` for broad language support.
+
+All SDKs currently support the portability framework. See the [Portability
support
table](https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0)
for details.
+
+The SDK harness is a SDK-provided program responsible for executing user code
and is run separately from the runner. SDK harness initialization relies on the
Provision and `Artifact API`s for obtaining staged files, pipeline options and
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying
cluster setups:
+
+* **environment_type**: determines where user code will be executed:
+ * **DOCKER**: User code is executed within a container started on each
worker node. This requires docker to be installed on worker nodes (default).
Use `environment_config` to specify the Docker image URL. Official Docker
images are used by default. Alternatively, you can build your own image by
following the instructions
[here](https://beam.apache.org/documentation/runtime/environments/). Prebuilt
SDK container images are released per supported language during Beam releases
and pushed to [Docker
Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image).
+ * **PROCESS**: User code is executed by processes that are automatically
started by the runner on each worker node.
+ * **EXTERNAL**: User code will be dispatched to an external service. Use
`environment_config` to specify the address for the external service, e.g.
`localhost:50000`.
+ * **LOOPBACK**: User code is executed within the same process that submitted
the pipeline.
+
+* **sdk_worker_parallelism**: sets the number of SDK workers that run on each
worker node. The default is 1. If 0, the value is automatically set by the
runner by looking at different parameters, such as the number of CPU cores on
the worker machine.
Review Comment:
```suggestion
2. **`sdk_worker_parallelism`**: determines the number of SDK workers per
worker node. The default value is 1, but setting it to 0 enables automatic
determination by the runner based on factors like the number of CPU cores on
the worker machine.
```
##########
learning/prompts/documentation-lookup-nolinks/48_sdk_harness.md:
##########
@@ -0,0 +1,19 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of
Apache Beam. The portability framework introduces well-defined,
language-neutral data structures and protocols between the SDK and runner. This
interop layer – called the `Portability API` – ensures that SDKs and runners
can work with each other uniformly, reducing the interoperability burden for
both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs
and runners for job submission, management and execution. These contracts use
`protobuf`s and `gRPC` for broad language support. All SDKs currently support
the portability framework.
+
+The SDK harness is a SDK-provided program responsible for executing user code
and is run separately from the runner. SDK harness initialization relies on the
Provision and `Artifact API`s for obtaining staged files, pipeline options and
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying
cluster setups:
+
+* **environment_type**: determines where user code will be executed:
+ * **DOCKER**: User code is executed within a container started on each
worker node. This requires docker to be installed on worker nodes (default).
Use `environment_config` to specify the Docker image URL. Official Docker
images are used by default. Alternatively, you can build your own image.
Prebuilt SDK container images are released per supported language during Beam
releases and pushed to Docker Hub.
+ * **PROCESS**: User code is executed by processes that are automatically
started by the runner on each worker node.
+ * **EXTERNAL**: User code will be dispatched to an external service. Use
`environment_config` to specify the address for the external service, e.g.
`localhost:50000`.
+ * **LOOPBACK**: User code is executed within the same process that submitted
the pipeline.
+
+* **sdk_worker_parallelism**: sets the number of SDK workers that run on each
worker node. The default is 1. If 0, the value is automatically set by the
runner by looking at different parameters, such as the number of CPU cores on
the worker machine.
Review Comment:
```suggestion
2. **`sdk_worker_parallelism`**: determines the number of SDK workers per
worker node. The default value is 1, but setting it to 0 enables automatic
determination by the runner based on factors like the number of CPU cores on
the worker machine.
```
##########
learning/prompts/documentation-lookup-nolinks/48_sdk_harness.md:
##########
@@ -0,0 +1,19 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of
Apache Beam. The portability framework introduces well-defined,
language-neutral data structures and protocols between the SDK and runner. This
interop layer – called the `Portability API` – ensures that SDKs and runners
can work with each other uniformly, reducing the interoperability burden for
both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs
and runners for job submission, management and execution. These contracts use
`protobuf`s and `gRPC` for broad language support. All SDKs currently support
the portability framework.
+
+The SDK harness is a SDK-provided program responsible for executing user code
and is run separately from the runner. SDK harness initialization relies on the
Provision and `Artifact API`s for obtaining staged files, pipeline options and
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying
cluster setups:
+
Review Comment:
```suggestion
```
##########
learning/prompts/documentation-lookup-nolinks/48_sdk_harness.md:
##########
@@ -0,0 +1,19 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of
Apache Beam. The portability framework introduces well-defined,
language-neutral data structures and protocols between the SDK and runner. This
interop layer – called the `Portability API` – ensures that SDKs and runners
can work with each other uniformly, reducing the interoperability burden for
both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs
and runners for job submission, management and execution. These contracts use
`protobuf`s and `gRPC` for broad language support. All SDKs currently support
the portability framework.
+
+The SDK harness is a SDK-provided program responsible for executing user code
and is run separately from the runner. SDK harness initialization relies on the
Provision and `Artifact API`s for obtaining staged files, pipeline options and
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying
cluster setups:
+
+* **environment_type**: determines where user code will be executed:
+ * **DOCKER**: User code is executed within a container started on each
worker node. This requires docker to be installed on worker nodes (default).
Use `environment_config` to specify the Docker image URL. Official Docker
images are used by default. Alternatively, you can build your own image.
Prebuilt SDK container images are released per supported language during Beam
releases and pushed to Docker Hub.
+ * **PROCESS**: User code is executed by processes that are automatically
started by the runner on each worker node.
Review Comment:
```suggestion
* `PROCESS`: executes user code through processes that are automatically
initiated by the runner on each worker node.
```
##########
learning/prompts/documentation-lookup-nolinks/48_sdk_harness.md:
##########
@@ -0,0 +1,19 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of
Apache Beam. The portability framework introduces well-defined,
language-neutral data structures and protocols between the SDK and runner. This
interop layer – called the `Portability API` – ensures that SDKs and runners
can work with each other uniformly, reducing the interoperability burden for
both SDKs and runners to a constant effort.
Review Comment:
```suggestion
Interoperability between Apache Beam SDKs and runners is a key aspect of
Apache Beam. The portability framework introduces well-defined,
language-neutral data structures and protocols between the SDK and runner. This
interoperability layer, known as the 'portability API', ensures that SDKs and
runners can seamlessly work with each other, reducing the interoperability
burden for both SDKs and runners to a constant effort.
```
##########
learning/prompts/documentation-lookup/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is [Docker Compose
service](https://docs.docker.com/compose/) included into Apache Beam SDK
versions 2.49.0 and later. You can use Transform service to upgrade or
downgrade Beam SDK version of individual supported transforms in your pipeline
without changing the Beam version of the pipeline. This feature is only
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline
options specifying the URNs of the transforms you want to upgrade and the Beam
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion`
specifies the Beam version you want to upgrade to.
+
+The framework will automatically download the specified version of Docker
containers for the transforms and use them in the pipeline. You must have
Docker installed on the machine that starts the service.
+
+Currently the following transforms are supported:
+
+* BigQuery read transform: `beam:transform:org.apache.beam:bigquery_read:v1`
+* BigQuery write transform: `beam:transform:org.apache.beam:bigquery_write:v1`
+* Kafka read transform:
`beam:transform:org.apache.beam:kafka_read_with_metadata:v2`
+* Kafka write transform: `beam:transform:org.apache.beam:kafka_write:v2`
+
+Transform service implement Beam expansion API. This means you can use the
Transform service to construct and execute multi-language pipelines. For
example, you can build a Python pipeline that uses Java `KafkaIO` transform and
execute in without installing Java locally.
Review Comment:
```suggestion
The Transform service implements the Beam expansion API, enabling
multi-language pipelines to leverage it for expanding supported transforms.
This feature allows you to create and run multi-language pipelines without
additional language runtimes. For instance, you can build a Python pipeline
that utilizes a Java `KafkaIO` transform without the need to install Java
locally.
```
##########
learning/prompts/documentation-lookup-nolinks/48_sdk_harness.md:
##########
@@ -0,0 +1,19 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of
Apache Beam. The portability framework introduces well-defined,
language-neutral data structures and protocols between the SDK and runner. This
interop layer – called the `Portability API` – ensures that SDKs and runners
can work with each other uniformly, reducing the interoperability burden for
both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs
and runners for job submission, management and execution. These contracts use
`protobuf`s and `gRPC` for broad language support. All SDKs currently support
the portability framework.
+
+The SDK harness is a SDK-provided program responsible for executing user code
and is run separately from the runner. SDK harness initialization relies on the
Provision and `Artifact API`s for obtaining staged files, pipeline options and
environment information.
Review Comment:
```suggestion
The SDK harness is a program responsible for executing user code. This
program is provided by an SDK and runs separately from the runner. SDK harness
initialization relies on the provision and artifact APIs for obtaining staged
files, pipeline options, and environment information.
```
##########
learning/prompts/documentation-lookup/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is [Docker Compose
service](https://docs.docker.com/compose/) included into Apache Beam SDK
versions 2.49.0 and later. You can use Transform service to upgrade or
downgrade Beam SDK version of individual supported transforms in your pipeline
without changing the Beam version of the pipeline. This feature is only
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline
options specifying the URNs of the transforms you want to upgrade and the Beam
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion`
specifies the Beam version you want to upgrade to.
+
+The framework will automatically download the specified version of Docker
containers for the transforms and use them in the pipeline. You must have
Docker installed on the machine that starts the service.
+
+Currently the following transforms are supported:
+
Review Comment:
```suggestion
Currently, the Transform service can upgrade the following transforms:
```
##########
learning/prompts/documentation-lookup-nolinks/48_sdk_harness.md:
##########
@@ -0,0 +1,19 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of
Apache Beam. The portability framework introduces well-defined,
language-neutral data structures and protocols between the SDK and runner. This
interop layer – called the `Portability API` – ensures that SDKs and runners
can work with each other uniformly, reducing the interoperability burden for
both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs
and runners for job submission, management and execution. These contracts use
`protobuf`s and `gRPC` for broad language support. All SDKs currently support
the portability framework.
+
+The SDK harness is a SDK-provided program responsible for executing user code
and is run separately from the runner. SDK harness initialization relies on the
Provision and `Artifact API`s for obtaining staged files, pipeline options and
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying
cluster setups:
+
+* **environment_type**: determines where user code will be executed:
+ * **DOCKER**: User code is executed within a container started on each
worker node. This requires docker to be installed on worker nodes (default).
Use `environment_config` to specify the Docker image URL. Official Docker
images are used by default. Alternatively, you can build your own image.
Prebuilt SDK container images are released per supported language during Beam
releases and pushed to Docker Hub.
+ * **PROCESS**: User code is executed by processes that are automatically
started by the runner on each worker node.
+ * **EXTERNAL**: User code will be dispatched to an external service. Use
`environment_config` to specify the address for the external service, e.g.
`localhost:50000`.
Review Comment:
```suggestion
* `EXTERNAL`: dispatches user code to an external service. Use the
`environment_config` parameter to specify the service address, for example,
`localhost:50000`.
```
##########
learning/prompts/documentation-lookup-nolinks/48_sdk_harness.md:
##########
@@ -0,0 +1,19 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of
Apache Beam. The portability framework introduces well-defined,
language-neutral data structures and protocols between the SDK and runner. This
interop layer – called the `Portability API` – ensures that SDKs and runners
can work with each other uniformly, reducing the interoperability burden for
both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs
and runners for job submission, management and execution. These contracts use
`protobuf`s and `gRPC` for broad language support. All SDKs currently support
the portability framework.
Review Comment:
```suggestion
The portability API consists of a set of smaller contracts that isolate SDKs
and runners for job submission, management, and execution. These contracts
utilize protocols like `protobuf` and `gRPC` to provide broad language support.
Currently, all SDKs support the portability framework.
```
##########
learning/prompts/documentation-lookup-nolinks/48_sdk_harness.md:
##########
@@ -0,0 +1,19 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of
Apache Beam. The portability framework introduces well-defined,
language-neutral data structures and protocols between the SDK and runner. This
interop layer – called the `Portability API` – ensures that SDKs and runners
can work with each other uniformly, reducing the interoperability burden for
both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs
and runners for job submission, management and execution. These contracts use
`protobuf`s and `gRPC` for broad language support. All SDKs currently support
the portability framework.
+
+The SDK harness is a SDK-provided program responsible for executing user code
and is run separately from the runner. SDK harness initialization relies on the
Provision and `Artifact API`s for obtaining staged files, pipeline options and
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying
cluster setups:
+
+* **environment_type**: determines where user code will be executed:
+ * **DOCKER**: User code is executed within a container started on each
worker node. This requires docker to be installed on worker nodes (default).
Use `environment_config` to specify the Docker image URL. Official Docker
images are used by default. Alternatively, you can build your own image.
Prebuilt SDK container images are released per supported language during Beam
releases and pushed to Docker Hub.
+ * **PROCESS**: User code is executed by processes that are automatically
started by the runner on each worker node.
+ * **EXTERNAL**: User code will be dispatched to an external service. Use
`environment_config` to specify the address for the external service, e.g.
`localhost:50000`.
+ * **LOOPBACK**: User code is executed within the same process that submitted
the pipeline.
Review Comment:
```suggestion
* `LOOPBACK`: executes user code within the same process that submitted
the pipeline.
```
##########
learning/prompts/documentation-lookup-nolinks/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is Docker Compose service included into Apache Beam
SDK versions 2.49.0 and later. You can use Transform service to upgrade or
downgrade Beam SDK version of individual supported transforms in your pipeline
without changing the Beam version of the pipeline. This feature is only
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline
options specifying the URNs of the transforms you want to upgrade and the Beam
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion`
specifies the Beam version you want to upgrade to.
Review Comment:
```suggestion
In the provided example, `--transformsToOverride` specifies the URN of the
transform to upgrade or downgrade, while `--transformServiceBeamVersion`
specifies the target Beam version.
```
##########
learning/prompts/documentation-lookup-nolinks/48_sdk_harness.md:
##########
@@ -0,0 +1,19 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of
Apache Beam. The portability framework introduces well-defined,
language-neutral data structures and protocols between the SDK and runner. This
interop layer – called the `Portability API` – ensures that SDKs and runners
can work with each other uniformly, reducing the interoperability burden for
both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs
and runners for job submission, management and execution. These contracts use
`protobuf`s and `gRPC` for broad language support. All SDKs currently support
the portability framework.
+
+The SDK harness is a SDK-provided program responsible for executing user code
and is run separately from the runner. SDK harness initialization relies on the
Provision and `Artifact API`s for obtaining staged files, pipeline options and
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying
cluster setups:
+
+* **environment_type**: determines where user code will be executed:
Review Comment:
```suggestion
1. **`environment_type`**: determines where user code is executed. The
`environment_config` parameter configures the environment based on the value of
`environment_type`:
```
##########
learning/prompts/documentation-lookup-nolinks/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is Docker Compose service included into Apache Beam
SDK versions 2.49.0 and later. You can use Transform service to upgrade or
downgrade Beam SDK version of individual supported transforms in your pipeline
without changing the Beam version of the pipeline. This feature is only
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline
options specifying the URNs of the transforms you want to upgrade and the Beam
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion`
specifies the Beam version you want to upgrade to.
+
+The framework will automatically download the specified version of Docker
containers for the transforms and use them in the pipeline. You must have
Docker installed on the machine that starts the service.
+
+Currently the following transforms are supported:
+
Review Comment:
```suggestion
```
##########
learning/prompts/documentation-lookup-nolinks/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is Docker Compose service included into Apache Beam
SDK versions 2.49.0 and later. You can use Transform service to upgrade or
downgrade Beam SDK version of individual supported transforms in your pipeline
without changing the Beam version of the pipeline. This feature is only
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline
options specifying the URNs of the transforms you want to upgrade and the Beam
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion`
specifies the Beam version you want to upgrade to.
+
+The framework will automatically download the specified version of Docker
containers for the transforms and use them in the pipeline. You must have
Docker installed on the machine that starts the service.
+
+Currently the following transforms are supported:
Review Comment:
```suggestion
Currently, the Transform service can upgrade the following transforms:
```
##########
learning/prompts/documentation-lookup-nolinks/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is Docker Compose service included into Apache Beam
SDK versions 2.49.0 and later. You can use Transform service to upgrade or
downgrade Beam SDK version of individual supported transforms in your pipeline
without changing the Beam version of the pipeline. This feature is only
supported for Beam Java SDK 2.53.0 and later.
Review Comment:
```suggestion
The Transform service is a Docker Compose service included in Apache Beam
SDK versions 2.49.0 and later. It enables you to upgrade or downgrade the Beam
SDK version of individual supported transforms in your pipeline without
changing the overall Beam version of the pipeline. Additionally, you can
utilize the Transform service to create and execute multi-language pipelines
without needing to install support for additional language runtimes.
```
##########
learning/prompts/documentation-lookup/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is [Docker Compose
service](https://docs.docker.com/compose/) included into Apache Beam SDK
versions 2.49.0 and later. You can use Transform service to upgrade or
downgrade Beam SDK version of individual supported transforms in your pipeline
without changing the Beam version of the pipeline. This feature is only
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline
options specifying the URNs of the transforms you want to upgrade and the Beam
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion`
specifies the Beam version you want to upgrade to.
+
+The framework will automatically download the specified version of Docker
containers for the transforms and use them in the pipeline. You must have
Docker installed on the machine that starts the service.
+
+Currently the following transforms are supported:
+
+* BigQuery read transform: `beam:transform:org.apache.beam:bigquery_read:v1`
+* BigQuery write transform: `beam:transform:org.apache.beam:bigquery_write:v1`
+* Kafka read transform:
`beam:transform:org.apache.beam:kafka_read_with_metadata:v2`
+* Kafka write transform: `beam:transform:org.apache.beam:kafka_write:v2`
+
+Transform service implement Beam expansion API. This means you can use the
Transform service to construct and execute multi-language pipelines. For
example, you can build a Python pipeline that uses Java `KafkaIO` transform and
execute in without installing Java locally.
+
+Transform service can be started automatically by Apache Beam SDK or manually
by users:
Review Comment:
```suggestion
In some cases, Apache Beam SDKs can start the Transform service
automatically, provided that Docker is available locally. You can also start
the Transform service manually by running the following command:
```
##########
learning/prompts/documentation-lookup-nolinks/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
Review Comment:
```suggestion
What is Transform service in Apache Beam?
```
##########
learning/prompts/documentation-lookup-nolinks/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is Docker Compose service included into Apache Beam
SDK versions 2.49.0 and later. You can use Transform service to upgrade or
downgrade Beam SDK version of individual supported transforms in your pipeline
without changing the Beam version of the pipeline. This feature is only
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline
options specifying the URNs of the transforms you want to upgrade and the Beam
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion`
specifies the Beam version you want to upgrade to.
+
+The framework will automatically download the specified version of Docker
containers for the transforms and use them in the pipeline. You must have
Docker installed on the machine that starts the service.
+
+Currently the following transforms are supported:
+
+* BigQuery read transform: `beam:transform:org.apache.beam:bigquery_read:v1`
+* BigQuery write transform: `beam:transform:org.apache.beam:bigquery_write:v1`
+* Kafka read transform:
`beam:transform:org.apache.beam:kafka_read_with_metadata:v2`
+* Kafka write transform: `beam:transform:org.apache.beam:kafka_write:v2`
Review Comment:
```suggestion
* BigQuery read: `beam:transform:org.apache.beam:bigquery_read:v1`
* BigQuery write: `beam:transform:org.apache.beam:bigquery_write:v1`
* Kafka read: `beam:transform:org.apache.beam:kafka_read_with_metadata:v2`
* Kafka write: `beam:transform:org.apache.beam:kafka_write:v2`
```
##########
learning/prompts/documentation-lookup-nolinks/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is Docker Compose service included into Apache Beam
SDK versions 2.49.0 and later. You can use Transform service to upgrade or
downgrade Beam SDK version of individual supported transforms in your pipeline
without changing the Beam version of the pipeline. This feature is only
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline
options specifying the URNs of the transforms you want to upgrade and the Beam
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion`
specifies the Beam version you want to upgrade to.
+
+The framework will automatically download the specified version of Docker
containers for the transforms and use them in the pipeline. You must have
Docker installed on the machine that starts the service.
+
+Currently the following transforms are supported:
+
+* BigQuery read transform: `beam:transform:org.apache.beam:bigquery_read:v1`
+* BigQuery write transform: `beam:transform:org.apache.beam:bigquery_write:v1`
+* Kafka read transform:
`beam:transform:org.apache.beam:kafka_read_with_metadata:v2`
+* Kafka write transform: `beam:transform:org.apache.beam:kafka_write:v2`
+
+Transform service implement Beam expansion API. This means you can use the
Transform service to construct and execute multi-language pipelines. For
example, you can build a Python pipeline that uses Java `KafkaIO` transform and
execute in without installing Java locally.
+
+Transform service can be started automatically by Apache Beam SDK or manually
by users:
Review Comment:
```suggestion
In some cases, Apache Beam SDKs can start the Transform service
automatically, provided that Docker is available locally. You can also start
the Transform service manually by running the following command:
```
##########
learning/prompts/documentation-lookup-nolinks/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is Docker Compose service included into Apache Beam
SDK versions 2.49.0 and later. You can use Transform service to upgrade or
downgrade Beam SDK version of individual supported transforms in your pipeline
without changing the Beam version of the pipeline. This feature is only
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline
options specifying the URNs of the transforms you want to upgrade and the Beam
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion`
specifies the Beam version you want to upgrade to.
+
+The framework will automatically download the specified version of Docker
containers for the transforms and use them in the pipeline. You must have
Docker installed on the machine that starts the service.
Review Comment:
```suggestion
The framework automatically downloads the specified version of Docker
containers for the transforms and uses them in the pipeline. You must have
Docker installed on the machine that starts the service.
```
##########
learning/prompts/documentation-lookup/48_sdk_harness.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of
Apache Beam. The [portability
framework](https://beam.apache.org/roadmap/portability/) introduces
well-defined, language-neutral data structures and protocols between the SDK
and runner. This interop layer – called the `Portability API` – ensures that
SDKs and runners can work with each other uniformly, reducing the
interoperability burden for both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs
and runners for job submission, management and execution. These contracts use
`protobuf`s and `gRPC` for broad language support.
Review Comment:
```suggestion
The portability API consists of a set of smaller contracts that isolate SDKs
and runners for job submission, management, and execution. These contracts
utilize protocols like `protobuf` and `gRPC` to provide broad language support.
```
##########
learning/prompts/documentation-lookup/48_sdk_harness.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of
Apache Beam. The [portability
framework](https://beam.apache.org/roadmap/portability/) introduces
well-defined, language-neutral data structures and protocols between the SDK
and runner. This interop layer – called the `Portability API` – ensures that
SDKs and runners can work with each other uniformly, reducing the
interoperability burden for both SDKs and runners to a constant effort.
Review Comment:
```suggestion
Interoperability between Apache Beam SDKs and runners is a key aspect of
Apache Beam. The [portability
framework](https://beam.apache.org/roadmap/portability/) introduces
well-defined, language-neutral data structures and protocols between the SDK
and runner. This interoperability layer, known as the 'portability API',
ensures that SDKs and runners can seamlessly work with each other, reducing the
interoperability burden for both SDKs and runners to a constant effort.
```
##########
learning/prompts/documentation-lookup-nolinks/48_sdk_harness.md:
##########
@@ -0,0 +1,19 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of
Apache Beam. The portability framework introduces well-defined,
language-neutral data structures and protocols between the SDK and runner. This
interop layer – called the `Portability API` – ensures that SDKs and runners
can work with each other uniformly, reducing the interoperability burden for
both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs
and runners for job submission, management and execution. These contracts use
`protobuf`s and `gRPC` for broad language support. All SDKs currently support
the portability framework.
+
+The SDK harness is a SDK-provided program responsible for executing user code
and is run separately from the runner. SDK harness initialization relies on the
Provision and `Artifact API`s for obtaining staged files, pipeline options and
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying
cluster setups:
+
+* **environment_type**: determines where user code will be executed:
+ * **DOCKER**: User code is executed within a container started on each
worker node. This requires docker to be installed on worker nodes (default).
Use `environment_config` to specify the Docker image URL. Official Docker
images are used by default. Alternatively, you can build your own image.
Prebuilt SDK container images are released per supported language during Beam
releases and pushed to Docker Hub.
+ * **PROCESS**: User code is executed by processes that are automatically
started by the runner on each worker node.
+ * **EXTERNAL**: User code will be dispatched to an external service. Use
`environment_config` to specify the address for the external service, e.g.
`localhost:50000`.
+ * **LOOPBACK**: User code is executed within the same process that submitted
the pipeline.
+
Review Comment:
```suggestion
```
##########
learning/prompts/documentation-lookup-nolinks/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is Docker Compose service included into Apache Beam
SDK versions 2.49.0 and later. You can use Transform service to upgrade or
downgrade Beam SDK version of individual supported transforms in your pipeline
without changing the Beam version of the pipeline. This feature is only
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline
options specifying the URNs of the transforms you want to upgrade and the Beam
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion`
specifies the Beam version you want to upgrade to.
+
+The framework will automatically download the specified version of Docker
containers for the transforms and use them in the pipeline. You must have
Docker installed on the machine that starts the service.
+
+Currently the following transforms are supported:
+
+* BigQuery read transform: `beam:transform:org.apache.beam:bigquery_read:v1`
+* BigQuery write transform: `beam:transform:org.apache.beam:bigquery_write:v1`
+* Kafka read transform:
`beam:transform:org.apache.beam:kafka_read_with_metadata:v2`
+* Kafka write transform: `beam:transform:org.apache.beam:kafka_write:v2`
+
+Transform service implement Beam expansion API. This means you can use the
Transform service to construct and execute multi-language pipelines. For
example, you can build a Python pipeline that uses Java `KafkaIO` transform and
execute in without installing Java locally.
+
+Transform service can be started automatically by Apache Beam SDK or manually
by users:
+
+```java
+java -jar beam-sdks-java-transform-service-app-<Beam version for the jar>.jar
--port <port> --beam_version <Beam version for the transform service>
--project_name <a unique ID for the transform service> --command up
+```
+
+Beam transform service includes a number of transforms implemented in the
Apache Beam Java and Python SDKs:
+
+* Java transforms: Google Cloud I/O connectors, the Kafka I/O connector, and
the JDBC I/O connector
+* Python transforms: all portable transforms implemented within the Apache
Beam Python SDK, such as RunInference and DataFrame transforms.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Review Comment:
```suggestion
The Beam Transform service includes several portable transforms implemented
in the Apache Beam Java and Python SDKs:
* Java transforms: Google Cloud I/O connectors, the Kafka I/O connector, and
the JDBC I/O connector.
* Python transforms: all portable transforms implemented within the Apache
Beam Python SDK, such as RunInference and DataFrame transforms.
```
##########
learning/prompts/documentation-lookup/48_sdk_harness.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of
Apache Beam. The [portability
framework](https://beam.apache.org/roadmap/portability/) introduces
well-defined, language-neutral data structures and protocols between the SDK
and runner. This interop layer – called the `Portability API` – ensures that
SDKs and runners can work with each other uniformly, reducing the
interoperability burden for both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs
and runners for job submission, management and execution. These contracts use
`protobuf`s and `gRPC` for broad language support.
+
+All SDKs currently support the portability framework. See the [Portability
support
table](https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0)
for details.
+
+The SDK harness is a SDK-provided program responsible for executing user code
and is run separately from the runner. SDK harness initialization relies on the
Provision and `Artifact API`s for obtaining staged files, pipeline options and
environment information.
Review Comment:
```suggestion
The SDK harness is a program responsible for executing user code. This
program is provided by an SDK and runs separately from the runner. SDK harness
initialization relies on the provision and artifact APIs for obtaining staged
files, pipeline options, and environment information.
```
##########
learning/prompts/documentation-lookup/48_sdk_harness.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of
Apache Beam. The [portability
framework](https://beam.apache.org/roadmap/portability/) introduces
well-defined, language-neutral data structures and protocols between the SDK
and runner. This interop layer – called the `Portability API` – ensures that
SDKs and runners can work with each other uniformly, reducing the
interoperability burden for both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs
and runners for job submission, management and execution. These contracts use
`protobuf`s and `gRPC` for broad language support.
+
+All SDKs currently support the portability framework. See the [Portability
support
table](https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0)
for details.
Review Comment:
```suggestion
Currently, all SDKs support the portability framework. For the latest
information on portability support across SDKs, features, and runners, refer to
the [Apache Beam Portability Support
Matrix](https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0).
```
##########
learning/prompts/documentation-lookup/48_sdk_harness.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of
Apache Beam. The [portability
framework](https://beam.apache.org/roadmap/portability/) introduces
well-defined, language-neutral data structures and protocols between the SDK
and runner. This interop layer – called the `Portability API` – ensures that
SDKs and runners can work with each other uniformly, reducing the
interoperability burden for both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs
and runners for job submission, management and execution. These contracts use
`protobuf`s and `gRPC` for broad language support.
+
+All SDKs currently support the portability framework. See the [Portability
support
table](https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0)
for details.
+
+The SDK harness is a SDK-provided program responsible for executing user code
and is run separately from the runner. SDK harness initialization relies on the
Provision and `Artifact API`s for obtaining staged files, pipeline options and
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying
cluster setups:
+
Review Comment:
```suggestion
```
##########
learning/prompts/documentation-lookup/48_sdk_harness.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of
Apache Beam. The [portability
framework](https://beam.apache.org/roadmap/portability/) introduces
well-defined, language-neutral data structures and protocols between the SDK
and runner. This interop layer – called the `Portability API` – ensures that
SDKs and runners can work with each other uniformly, reducing the
interoperability burden for both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs
and runners for job submission, management and execution. These contracts use
`protobuf`s and `gRPC` for broad language support.
+
+All SDKs currently support the portability framework. See the [Portability
support
table](https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0)
for details.
+
+The SDK harness is a SDK-provided program responsible for executing user code
and is run separately from the runner. SDK harness initialization relies on the
Provision and `Artifact API`s for obtaining staged files, pipeline options and
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying
cluster setups:
+
+* **environment_type**: determines where user code will be executed:
Review Comment:
```suggestion
1. **`environment_type`**: determines where user code is executed. The
`environment_config` parameter configures the environment based on the value of
`environment_type`:
```
##########
learning/prompts/documentation-lookup/48_sdk_harness.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of
Apache Beam. The [portability
framework](https://beam.apache.org/roadmap/portability/) introduces
well-defined, language-neutral data structures and protocols between the SDK
and runner. This interop layer – called the `Portability API` – ensures that
SDKs and runners can work with each other uniformly, reducing the
interoperability burden for both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs
and runners for job submission, management and execution. These contracts use
`protobuf`s and `gRPC` for broad language support.
+
+All SDKs currently support the portability framework. See the [Portability
support
table](https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0)
for details.
+
+The SDK harness is a SDK-provided program responsible for executing user code
and is run separately from the runner. SDK harness initialization relies on the
Provision and `Artifact API`s for obtaining staged files, pipeline options and
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying
cluster setups:
+
+* **environment_type**: determines where user code will be executed:
+ * **DOCKER**: User code is executed within a container started on each
worker node. This requires docker to be installed on worker nodes (default).
Use `environment_config` to specify the Docker image URL. Official Docker
images are used by default. Alternatively, you can build your own image by
following the instructions
[here](https://beam.apache.org/documentation/runtime/environments/). Prebuilt
SDK container images are released per supported language during Beam releases
and pushed to [Docker
Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image).
+ * **PROCESS**: User code is executed by processes that are automatically
started by the runner on each worker node.
+ * **EXTERNAL**: User code will be dispatched to an external service. Use
`environment_config` to specify the address for the external service, e.g.
`localhost:50000`.
+ * **LOOPBACK**: User code is executed within the same process that submitted
the pipeline.
Review Comment:
```suggestion
* `LOOPBACK`: executes user code within the same process that submitted
the pipeline.
```
##########
learning/prompts/documentation-lookup/48_sdk_harness.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of
Apache Beam. The [portability
framework](https://beam.apache.org/roadmap/portability/) introduces
well-defined, language-neutral data structures and protocols between the SDK
and runner. This interop layer – called the `Portability API` – ensures that
SDKs and runners can work with each other uniformly, reducing the
interoperability burden for both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs
and runners for job submission, management and execution. These contracts use
`protobuf`s and `gRPC` for broad language support.
+
+All SDKs currently support the portability framework. See the [Portability
support
table](https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0)
for details.
+
+The SDK harness is a SDK-provided program responsible for executing user code
and is run separately from the runner. SDK harness initialization relies on the
Provision and `Artifact API`s for obtaining staged files, pipeline options and
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying
cluster setups:
+
+* **environment_type**: determines where user code will be executed:
+ * **DOCKER**: User code is executed within a container started on each
worker node. This requires docker to be installed on worker nodes (default).
Use `environment_config` to specify the Docker image URL. Official Docker
images are used by default. Alternatively, you can build your own image by
following the instructions
[here](https://beam.apache.org/documentation/runtime/environments/). Prebuilt
SDK container images are released per supported language during Beam releases
and pushed to [Docker
Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image).
+ * **PROCESS**: User code is executed by processes that are automatically
started by the runner on each worker node.
Review Comment:
```suggestion
* `PROCESS`: executes user code through processes that are automatically
initiated by the runner on each worker node.
```
##########
learning/prompts/documentation-lookup/48_sdk_harness.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of
Apache Beam. The [portability
framework](https://beam.apache.org/roadmap/portability/) introduces
well-defined, language-neutral data structures and protocols between the SDK
and runner. This interop layer – called the `Portability API` – ensures that
SDKs and runners can work with each other uniformly, reducing the
interoperability burden for both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs
and runners for job submission, management and execution. These contracts use
`protobuf`s and `gRPC` for broad language support.
+
+All SDKs currently support the portability framework. See the [Portability
support
table](https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0)
for details.
+
+The SDK harness is a SDK-provided program responsible for executing user code
and is run separately from the runner. SDK harness initialization relies on the
Provision and `Artifact API`s for obtaining staged files, pipeline options and
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying
cluster setups:
+
+* **environment_type**: determines where user code will be executed:
+ * **DOCKER**: User code is executed within a container started on each
worker node. This requires docker to be installed on worker nodes (default).
Use `environment_config` to specify the Docker image URL. Official Docker
images are used by default. Alternatively, you can build your own image by
following the instructions
[here](https://beam.apache.org/documentation/runtime/environments/). Prebuilt
SDK container images are released per supported language during Beam releases
and pushed to [Docker
Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image).
Review Comment:
```suggestion
* `DOCKER`: executes user code within a container on each worker node.
Docker must be installed on worker nodes. You can specify the Docker image URL
using the `environment_config` parameter. Prebuilt SDK container images are
available with each Apache Beam release and pushed to [Docker
Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image). You can also
[build your custom
image](https://beam.apache.org/documentation/runtime/environments/).
```
##########
learning/prompts/documentation-lookup/48_sdk_harness.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of
Apache Beam. The [portability
framework](https://beam.apache.org/roadmap/portability/) introduces
well-defined, language-neutral data structures and protocols between the SDK
and runner. This interop layer – called the `Portability API` – ensures that
SDKs and runners can work with each other uniformly, reducing the
interoperability burden for both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs
and runners for job submission, management and execution. These contracts use
`protobuf`s and `gRPC` for broad language support.
+
+All SDKs currently support the portability framework. See the [Portability
support
table](https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0)
for details.
+
+The SDK harness is a SDK-provided program responsible for executing user code
and is run separately from the runner. SDK harness initialization relies on the
Provision and `Artifact API`s for obtaining staged files, pipeline options and
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying
cluster setups:
Review Comment:
```suggestion
Apache Beam offers configuration options for the SDK harness to cater to
diverse cluster setups. These options include:
```
##########
learning/prompts/documentation-lookup/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
Review Comment:
```suggestion
What is Transform service in Apache Beam?
```
##########
learning/prompts/documentation-lookup/48_sdk_harness.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of
Apache Beam. The [portability
framework](https://beam.apache.org/roadmap/portability/) introduces
well-defined, language-neutral data structures and protocols between the SDK
and runner. This interop layer – called the `Portability API` – ensures that
SDKs and runners can work with each other uniformly, reducing the
interoperability burden for both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs
and runners for job submission, management and execution. These contracts use
`protobuf`s and `gRPC` for broad language support.
+
+All SDKs currently support the portability framework. See the [Portability
support
table](https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0)
for details.
+
+The SDK harness is a SDK-provided program responsible for executing user code
and is run separately from the runner. SDK harness initialization relies on the
Provision and `Artifact API`s for obtaining staged files, pipeline options and
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying
cluster setups:
+
+* **environment_type**: determines where user code will be executed:
+ * **DOCKER**: User code is executed within a container started on each
worker node. This requires docker to be installed on worker nodes (default).
Use `environment_config` to specify the Docker image URL. Official Docker
images are used by default. Alternatively, you can build your own image by
following the instructions
[here](https://beam.apache.org/documentation/runtime/environments/). Prebuilt
SDK container images are released per supported language during Beam releases
and pushed to [Docker
Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image).
+ * **PROCESS**: User code is executed by processes that are automatically
started by the runner on each worker node.
+ * **EXTERNAL**: User code will be dispatched to an external service. Use
`environment_config` to specify the address for the external service, e.g.
`localhost:50000`.
Review Comment:
```suggestion
* `EXTERNAL`: dispatches user code to an external service. Use the
`environment_config` parameter to specify the service address, for example,
`localhost:50000`.
```
##########
learning/prompts/documentation-lookup/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is [Docker Compose
service](https://docs.docker.com/compose/) included into Apache Beam SDK
versions 2.49.0 and later. You can use Transform service to upgrade or
downgrade Beam SDK version of individual supported transforms in your pipeline
without changing the Beam version of the pipeline. This feature is only
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline
options specifying the URNs of the transforms you want to upgrade and the Beam
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion`
specifies the Beam version you want to upgrade to.
Review Comment:
```suggestion
In the provided example, `--transformsToOverride` specifies the URN of the
transform to upgrade or downgrade, while `--transformServiceBeamVersion`
specifies the target Beam version.
```
##########
learning/prompts/documentation-lookup/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is [Docker Compose
service](https://docs.docker.com/compose/) included into Apache Beam SDK
versions 2.49.0 and later. You can use Transform service to upgrade or
downgrade Beam SDK version of individual supported transforms in your pipeline
without changing the Beam version of the pipeline. This feature is only
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline
options specifying the URNs of the transforms you want to upgrade and the Beam
version you want to upgrade to:
Review Comment:
```suggestion
You can utilize the Transform service to upgrade specific transforms only if
you are using Beam Java SDK 2.53.0 and later. To employ this feature, execute a
Java pipeline with additional pipeline options specifying the URNs of the
transforms you want to upgrade and the desired Apache Beam version:
```
##########
learning/prompts/documentation-lookup/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is [Docker Compose
service](https://docs.docker.com/compose/) included into Apache Beam SDK
versions 2.49.0 and later. You can use Transform service to upgrade or
downgrade Beam SDK version of individual supported transforms in your pipeline
without changing the Beam version of the pipeline. This feature is only
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline
options specifying the URNs of the transforms you want to upgrade and the Beam
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion`
specifies the Beam version you want to upgrade to.
+
+The framework will automatically download the specified version of Docker
containers for the transforms and use them in the pipeline. You must have
Docker installed on the machine that starts the service.
+
+Currently the following transforms are supported:
+
+* BigQuery read transform: `beam:transform:org.apache.beam:bigquery_read:v1`
+* BigQuery write transform: `beam:transform:org.apache.beam:bigquery_write:v1`
+* Kafka read transform:
`beam:transform:org.apache.beam:kafka_read_with_metadata:v2`
+* Kafka write transform: `beam:transform:org.apache.beam:kafka_write:v2`
+
+Transform service implement Beam expansion API. This means you can use the
Transform service to construct and execute multi-language pipelines. For
example, you can build a Python pipeline that uses Java `KafkaIO` transform and
execute in without installing Java locally.
+
+Transform service can be started automatically by Apache Beam SDK or manually
by users:
+
+```java
+java -jar beam-sdks-java-transform-service-app-<Beam version for the jar>.jar
--port <port> --beam_version <Beam version for the transform service>
--project_name <a unique ID for the transform service> --command up
+```
+
+Beam transform service includes a number of transforms implemented in the
Apache Beam Java and Python SDKs:
+
+* Java transforms: Google Cloud I/O connectors, the Kafka I/O connector, and
the JDBC I/O connector
+* Python transforms: all portable transforms implemented within the Apache
Beam Python SDK, such as RunInference and DataFrame transforms.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Review Comment:
```suggestion
The Beam Transform service includes several portable transforms implemented
in the Apache Beam Java and Python SDKs:
* Java transforms: Google Cloud I/O connectors, the Kafka I/O connector, and
the JDBC I/O connector.
* Python transforms: all portable transforms implemented within the Apache
Beam Python SDK, such as RunInference and DataFrame transforms.
```
##########
learning/prompts/documentation-lookup/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is [Docker Compose
service](https://docs.docker.com/compose/) included into Apache Beam SDK
versions 2.49.0 and later. You can use Transform service to upgrade or
downgrade Beam SDK version of individual supported transforms in your pipeline
without changing the Beam version of the pipeline. This feature is only
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline
options specifying the URNs of the transforms you want to upgrade and the Beam
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion`
specifies the Beam version you want to upgrade to.
+
+The framework will automatically download the specified version of Docker
containers for the transforms and use them in the pipeline. You must have
Docker installed on the machine that starts the service.
Review Comment:
```suggestion
The framework automatically downloads the specified version of Docker
containers for the transforms and uses them in the pipeline. You must have
Docker installed on the machine that starts the service.
```
##########
learning/prompts/documentation-lookup/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is [Docker Compose
service](https://docs.docker.com/compose/) included into Apache Beam SDK
versions 2.49.0 and later. You can use Transform service to upgrade or
downgrade Beam SDK version of individual supported transforms in your pipeline
without changing the Beam version of the pipeline. This feature is only
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline
options specifying the URNs of the transforms you want to upgrade and the Beam
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion`
specifies the Beam version you want to upgrade to.
+
+The framework will automatically download the specified version of Docker
containers for the transforms and use them in the pipeline. You must have
Docker installed on the machine that starts the service.
+
+Currently the following transforms are supported:
+
+* BigQuery read transform: `beam:transform:org.apache.beam:bigquery_read:v1`
+* BigQuery write transform: `beam:transform:org.apache.beam:bigquery_write:v1`
+* Kafka read transform:
`beam:transform:org.apache.beam:kafka_read_with_metadata:v2`
+* Kafka write transform: `beam:transform:org.apache.beam:kafka_write:v2`
+
Review Comment:
```suggestion
* BigQuery read: `beam:transform:org.apache.beam:bigquery_read:v1`
* BigQuery write: `beam:transform:org.apache.beam:bigquery_write:v1`
* Kafka read: `beam:transform:org.apache.beam:kafka_read_with_metadata:v2`
* Kafka write: `beam:transform:org.apache.beam:kafka_write:v2`
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]