Re: [PR] [DRAFT] transform service prompt added [beam]

via GitHub Thu, 07 Mar 2024 06:45:51 -0800


dariabezkorovaina commented on code in PR #30499:
URL: https://github.com/apache/beam/pull/30499#discussion_r1516106352



##########
learning/prompts/documentation-lookup-nolinks/48_sdk_harness.md:
##########
@@ -0,0 +1,19 @@
+Prompt:
+What is SDK Harness in Apache Beam?

Review Comment:
   ```suggestion
   What is SDK harness in Apache Beam?
   ```



##########
learning/prompts/documentation-lookup-nolinks/48_sdk_harness.md:
##########
@@ -0,0 +1,19 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of 
Apache Beam. The portability framework introduces well-defined, 
language-neutral data structures and protocols between the SDK and runner. This 
interop layer – called the `Portability API` – ensures that SDKs and runners 
can work with each other uniformly, reducing the interoperability burden for 
both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs 
and runners for job submission, management and execution. These contracts use 
`protobuf`s and `gRPC` for broad language support. All SDKs currently support 
the portability framework.
+
+The SDK harness is a SDK-provided program responsible for executing user code 
and is run separately from the runner. SDK harness initialization relies on the 
Provision and `Artifact API`s for obtaining staged files, pipeline options and 
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying 
cluster setups:
+
+* **environment_type**: determines where user code will be executed:
+  * **DOCKER**: User code is executed within a container started on each 
worker node. This requires docker to be installed on worker nodes (default). 
Use `environment_config`  to specify the Docker image URL. Official Docker 
images are used by default. Alternatively, you can build your own image. 
Prebuilt SDK container images are released per supported language during Beam 
releases and pushed to Docker Hub.

Review Comment:
   ```suggestion
     * `DOCKER`: executes user code within a container on each worker node. 
Docker must be installed on worker nodes. You can specify the Docker image URL 
using the `environment_config` parameter. Prebuilt SDK container images are 
available with each Apache Beam release and pushed to Docker Hub. You can also 
build your custom image.
   ```



##########
learning/prompts/documentation-lookup-nolinks/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is Docker Compose service included into Apache Beam 
SDK versions 2.49.0 and later. You can use Transform service to upgrade or 
downgrade Beam SDK version of individual supported transforms in your pipeline 
without changing the Beam version of the pipeline. This feature is only 
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline 
options specifying the URNs of the transforms you want to upgrade and the Beam 
version you want to upgrade to:

Review Comment:
   ```suggestion
   You can utilize the Transform service to upgrade specific transforms only if 
you are using Beam Java SDK 2.53.0 and later. To employ this feature, execute a 
Java pipeline with additional pipeline options specifying the URNs of the 
transforms you want to upgrade and the desired Apache Beam version:
   ```



##########
learning/prompts/documentation-lookup-nolinks/48_sdk_harness.md:
##########
@@ -0,0 +1,19 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of 
Apache Beam. The portability framework introduces well-defined, 
language-neutral data structures and protocols between the SDK and runner. This 
interop layer – called the `Portability API` – ensures that SDKs and runners 
can work with each other uniformly, reducing the interoperability burden for 
both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs 
and runners for job submission, management and execution. These contracts use 
`protobuf`s and `gRPC` for broad language support. All SDKs currently support 
the portability framework.
+
+The SDK harness is a SDK-provided program responsible for executing user code 
and is run separately from the runner. SDK harness initialization relies on the 
Provision and `Artifact API`s for obtaining staged files, pipeline options and 
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying 
cluster setups:

Review Comment:
   ```suggestion
   Apache Beam offers configuration options for the SDK harness to cater to 
diverse cluster setups. These options include:
   ```



##########
learning/prompts/documentation-lookup/48_sdk_harness.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+What is SDK Harness in Apache Beam?

Review Comment:
   ```suggestion
   What is SDK harness in Apache Beam?
   ```



##########
learning/prompts/documentation-lookup-nolinks/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is Docker Compose service included into Apache Beam 
SDK versions 2.49.0 and later. You can use Transform service to upgrade or 
downgrade Beam SDK version of individual supported transforms in your pipeline 
without changing the Beam version of the pipeline. This feature is only 
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline 
options specifying the URNs of the transforms you want to upgrade and the Beam 
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1 
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the 
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion` 
specifies the Beam version you want to upgrade to.
+
+The framework will automatically download the specified version of Docker 
containers for the transforms and use them in the pipeline. You must have 
Docker installed on the machine that starts the service.
+
+Currently the following transforms are supported:
+
+* BigQuery read transform: `beam:transform:org.apache.beam:bigquery_read:v1`
+* BigQuery write transform: `beam:transform:org.apache.beam:bigquery_write:v1`
+* Kafka read transform: 
`beam:transform:org.apache.beam:kafka_read_with_metadata:v2`
+* Kafka write transform: `beam:transform:org.apache.beam:kafka_write:v2`
+
+Transform service implement Beam expansion API. This means you can use the 
Transform service to construct and execute multi-language pipelines. For 
example, you can build a Python pipeline that uses Java `KafkaIO` transform and 
execute in without installing Java locally.

Review Comment:
   ```suggestion
   The Transform service implements the Beam expansion API, enabling 
multi-language pipelines to leverage it for expanding supported transforms. 
This feature allows you to create and run multi-language pipelines without 
additional language runtimes. For instance, you can build a Python pipeline 
that utilizes a Java `KafkaIO` transform without the need to install Java 
locally.
   ```



##########
learning/prompts/documentation-lookup/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is [Docker Compose 
service](https://docs.docker.com/compose/) included into Apache Beam SDK 
versions 2.49.0 and later. You can use Transform service to upgrade or 
downgrade Beam SDK version of individual supported transforms in your pipeline 
without changing the Beam version of the pipeline. This feature is only 
supported for Beam Java SDK 2.53.0 and later.

Review Comment:
   ```suggestion
   The Transform service is a [Docker 
Compose](https://docs.docker.com/compose/) service included in Apache Beam SDK 
versions 2.49.0 and later. It enables you to upgrade or downgrade the Beam SDK 
version of individual supported transforms in your pipeline without changing 
the overall Beam version of the pipeline. Additionally, you can utilize the 
Transform service to create and execute multi-language pipelines without 
needing to install support for additional language runtimes.
   ```



##########
learning/prompts/documentation-lookup/48_sdk_harness.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of 
Apache Beam. The [portability 
framework](https://beam.apache.org/roadmap/portability/) introduces 
well-defined, language-neutral data structures and protocols between the SDK 
and runner. This interop layer – called the `Portability API` – ensures that 
SDKs and runners can work with each other uniformly, reducing the 
interoperability burden for both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs 
and runners for job submission, management and execution. These contracts use 
`protobuf`s and `gRPC` for broad language support.
+
+All SDKs currently support the portability framework. See the [Portability 
support 
table](https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0)
 for details.
+
+The SDK harness is a SDK-provided program responsible for executing user code 
and is run separately from the runner. SDK harness initialization relies on the 
Provision and `Artifact API`s for obtaining staged files, pipeline options and 
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying 
cluster setups:
+
+* **environment_type**: determines where user code will be executed:
+  * **DOCKER**: User code is executed within a container started on each 
worker node. This requires docker to be installed on worker nodes (default). 
Use `environment_config`  to specify the Docker image URL. Official Docker 
images are used by default. Alternatively, you can build your own image by 
following the instructions 
[here](https://beam.apache.org/documentation/runtime/environments/). Prebuilt 
SDK container images are released per supported language during Beam releases 
and pushed to [Docker 
Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image).
+  * **PROCESS**: User code is executed by processes that are automatically 
started by the runner on each worker node.
+  * **EXTERNAL**: User code will be dispatched to an external service. Use 
`environment_config` to specify the address for the external service, e.g. 
`localhost:50000`.
+  * **LOOPBACK**: User code is executed within the same process that submitted 
the pipeline.
+
+* **sdk_worker_parallelism**:  sets the number of SDK workers that run on each 
worker node. The default is 1. If 0, the value is automatically set by the 
runner by looking at different parameters, such as the number of CPU cores on 
the worker machine.

Review Comment:
   ```suggestion
   2. **`sdk_worker_parallelism`**: determines the number of SDK workers per 
worker node. The default value is 1, but setting it to 0 enables automatic 
determination by the runner based on factors like the number of CPU cores on 
the worker machine.
   ```



##########
learning/prompts/documentation-lookup-nolinks/48_sdk_harness.md:
##########
@@ -0,0 +1,19 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of 
Apache Beam. The portability framework introduces well-defined, 
language-neutral data structures and protocols between the SDK and runner. This 
interop layer – called the `Portability API` – ensures that SDKs and runners 
can work with each other uniformly, reducing the interoperability burden for 
both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs 
and runners for job submission, management and execution. These contracts use 
`protobuf`s and `gRPC` for broad language support. All SDKs currently support 
the portability framework.
+
+The SDK harness is a SDK-provided program responsible for executing user code 
and is run separately from the runner. SDK harness initialization relies on the 
Provision and `Artifact API`s for obtaining staged files, pipeline options and 
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying 
cluster setups:
+
+* **environment_type**: determines where user code will be executed:
+  * **DOCKER**: User code is executed within a container started on each 
worker node. This requires docker to be installed on worker nodes (default). 
Use `environment_config`  to specify the Docker image URL. Official Docker 
images are used by default. Alternatively, you can build your own image. 
Prebuilt SDK container images are released per supported language during Beam 
releases and pushed to Docker Hub.
+  * **PROCESS**: User code is executed by processes that are automatically 
started by the runner on each worker node.
+  * **EXTERNAL**: User code will be dispatched to an external service. Use 
`environment_config` to specify the address for the external service, e.g. 
`localhost:50000`.
+  * **LOOPBACK**: User code is executed within the same process that submitted 
the pipeline.
+
+* **sdk_worker_parallelism**:  sets the number of SDK workers that run on each 
worker node. The default is 1. If 0, the value is automatically set by the 
runner by looking at different parameters, such as the number of CPU cores on 
the worker machine.

Review Comment:
   ```suggestion
   2. **`sdk_worker_parallelism`**: determines the number of SDK workers per 
worker node. The default value is 1, but setting it to 0 enables automatic 
determination by the runner based on factors like the number of CPU cores on 
the worker machine.
   ```



##########
learning/prompts/documentation-lookup-nolinks/48_sdk_harness.md:
##########
@@ -0,0 +1,19 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of 
Apache Beam. The portability framework introduces well-defined, 
language-neutral data structures and protocols between the SDK and runner. This 
interop layer – called the `Portability API` – ensures that SDKs and runners 
can work with each other uniformly, reducing the interoperability burden for 
both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs 
and runners for job submission, management and execution. These contracts use 
`protobuf`s and `gRPC` for broad language support. All SDKs currently support 
the portability framework.
+
+The SDK harness is a SDK-provided program responsible for executing user code 
and is run separately from the runner. SDK harness initialization relies on the 
Provision and `Artifact API`s for obtaining staged files, pipeline options and 
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying 
cluster setups:
+

Review Comment:
   ```suggestion
   ```



##########
learning/prompts/documentation-lookup-nolinks/48_sdk_harness.md:
##########
@@ -0,0 +1,19 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of 
Apache Beam. The portability framework introduces well-defined, 
language-neutral data structures and protocols between the SDK and runner. This 
interop layer – called the `Portability API` – ensures that SDKs and runners 
can work with each other uniformly, reducing the interoperability burden for 
both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs 
and runners for job submission, management and execution. These contracts use 
`protobuf`s and `gRPC` for broad language support. All SDKs currently support 
the portability framework.
+
+The SDK harness is a SDK-provided program responsible for executing user code 
and is run separately from the runner. SDK harness initialization relies on the 
Provision and `Artifact API`s for obtaining staged files, pipeline options and 
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying 
cluster setups:
+
+* **environment_type**: determines where user code will be executed:
+  * **DOCKER**: User code is executed within a container started on each 
worker node. This requires docker to be installed on worker nodes (default). 
Use `environment_config`  to specify the Docker image URL. Official Docker 
images are used by default. Alternatively, you can build your own image. 
Prebuilt SDK container images are released per supported language during Beam 
releases and pushed to Docker Hub.
+  * **PROCESS**: User code is executed by processes that are automatically 
started by the runner on each worker node.

Review Comment:
   ```suggestion
     * `PROCESS`: executes user code through processes that are automatically 
initiated by the runner on each worker node.
   ```



##########
learning/prompts/documentation-lookup-nolinks/48_sdk_harness.md:
##########
@@ -0,0 +1,19 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of 
Apache Beam. The portability framework introduces well-defined, 
language-neutral data structures and protocols between the SDK and runner. This 
interop layer – called the `Portability API` – ensures that SDKs and runners 
can work with each other uniformly, reducing the interoperability burden for 
both SDKs and runners to a constant effort.

Review Comment:
   ```suggestion
   Interoperability between Apache Beam SDKs and runners is a key aspect of 
Apache Beam. The portability framework introduces well-defined, 
language-neutral data structures and protocols between the SDK and runner. This 
interoperability layer, known as the 'portability API', ensures that SDKs and 
runners can seamlessly work with each other, reducing the interoperability 
burden for both SDKs and runners to a constant effort.
   ```



##########
learning/prompts/documentation-lookup/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is [Docker Compose 
service](https://docs.docker.com/compose/) included into Apache Beam SDK 
versions 2.49.0 and later. You can use Transform service to upgrade or 
downgrade Beam SDK version of individual supported transforms in your pipeline 
without changing the Beam version of the pipeline. This feature is only 
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline 
options specifying the URNs of the transforms you want to upgrade and the Beam 
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1 
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the 
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion` 
specifies the Beam version you want to upgrade to.
+
+The framework will automatically download the specified version of Docker 
containers for the transforms and use them in the pipeline. You must have 
Docker installed on the machine that starts the service.
+
+Currently the following transforms are supported:
+
+* BigQuery read transform: `beam:transform:org.apache.beam:bigquery_read:v1`
+* BigQuery write transform: `beam:transform:org.apache.beam:bigquery_write:v1`
+* Kafka read transform: 
`beam:transform:org.apache.beam:kafka_read_with_metadata:v2`
+* Kafka write transform: `beam:transform:org.apache.beam:kafka_write:v2`
+
+Transform service implement Beam expansion API. This means you can use the 
Transform service to construct and execute multi-language pipelines. For 
example, you can build a Python pipeline that uses Java `KafkaIO` transform and 
execute in without installing Java locally.

Review Comment:
   ```suggestion
   The Transform service implements the Beam expansion API, enabling 
multi-language pipelines to leverage it for expanding supported transforms. 
This feature allows you to create and run multi-language pipelines without 
additional language runtimes. For instance, you can build a Python pipeline 
that utilizes a Java `KafkaIO` transform without the need to install Java 
locally.
   ```



##########
learning/prompts/documentation-lookup-nolinks/48_sdk_harness.md:
##########
@@ -0,0 +1,19 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of 
Apache Beam. The portability framework introduces well-defined, 
language-neutral data structures and protocols between the SDK and runner. This 
interop layer – called the `Portability API` – ensures that SDKs and runners 
can work with each other uniformly, reducing the interoperability burden for 
both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs 
and runners for job submission, management and execution. These contracts use 
`protobuf`s and `gRPC` for broad language support. All SDKs currently support 
the portability framework.
+
+The SDK harness is a SDK-provided program responsible for executing user code 
and is run separately from the runner. SDK harness initialization relies on the 
Provision and `Artifact API`s for obtaining staged files, pipeline options and 
environment information.

Review Comment:
   ```suggestion
   The SDK harness is a program responsible for executing user code. This 
program is provided by an SDK and runs separately from the runner. SDK harness 
initialization relies on the provision and artifact APIs for obtaining staged 
files, pipeline options, and environment information.
   ```



##########
learning/prompts/documentation-lookup/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is [Docker Compose 
service](https://docs.docker.com/compose/) included into Apache Beam SDK 
versions 2.49.0 and later. You can use Transform service to upgrade or 
downgrade Beam SDK version of individual supported transforms in your pipeline 
without changing the Beam version of the pipeline. This feature is only 
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline 
options specifying the URNs of the transforms you want to upgrade and the Beam 
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1 
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the 
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion` 
specifies the Beam version you want to upgrade to.
+
+The framework will automatically download the specified version of Docker 
containers for the transforms and use them in the pipeline. You must have 
Docker installed on the machine that starts the service.
+
+Currently the following transforms are supported:
+

Review Comment:
   ```suggestion
   Currently, the Transform service can upgrade the following transforms:
   ```



##########
learning/prompts/documentation-lookup-nolinks/48_sdk_harness.md:
##########
@@ -0,0 +1,19 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of 
Apache Beam. The portability framework introduces well-defined, 
language-neutral data structures and protocols between the SDK and runner. This 
interop layer – called the `Portability API` – ensures that SDKs and runners 
can work with each other uniformly, reducing the interoperability burden for 
both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs 
and runners for job submission, management and execution. These contracts use 
`protobuf`s and `gRPC` for broad language support. All SDKs currently support 
the portability framework.
+
+The SDK harness is a SDK-provided program responsible for executing user code 
and is run separately from the runner. SDK harness initialization relies on the 
Provision and `Artifact API`s for obtaining staged files, pipeline options and 
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying 
cluster setups:
+
+* **environment_type**: determines where user code will be executed:
+  * **DOCKER**: User code is executed within a container started on each 
worker node. This requires docker to be installed on worker nodes (default). 
Use `environment_config`  to specify the Docker image URL. Official Docker 
images are used by default. Alternatively, you can build your own image. 
Prebuilt SDK container images are released per supported language during Beam 
releases and pushed to Docker Hub.
+  * **PROCESS**: User code is executed by processes that are automatically 
started by the runner on each worker node.
+  * **EXTERNAL**: User code will be dispatched to an external service. Use 
`environment_config` to specify the address for the external service, e.g. 
`localhost:50000`.

Review Comment:
   ```suggestion
     * `EXTERNAL`: dispatches user code to an external service. Use the 
`environment_config` parameter to specify the service address, for example, 
`localhost:50000`.
   ```



##########
learning/prompts/documentation-lookup-nolinks/48_sdk_harness.md:
##########
@@ -0,0 +1,19 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of 
Apache Beam. The portability framework introduces well-defined, 
language-neutral data structures and protocols between the SDK and runner. This 
interop layer – called the `Portability API` – ensures that SDKs and runners 
can work with each other uniformly, reducing the interoperability burden for 
both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs 
and runners for job submission, management and execution. These contracts use 
`protobuf`s and `gRPC` for broad language support. All SDKs currently support 
the portability framework.

Review Comment:
   ```suggestion
   The portability API consists of a set of smaller contracts that isolate SDKs 
and runners for job submission, management, and execution. These contracts 
utilize protocols like `protobuf` and `gRPC` to provide broad language support. 
Currently, all SDKs support the portability framework.
   ```



##########
learning/prompts/documentation-lookup-nolinks/48_sdk_harness.md:
##########
@@ -0,0 +1,19 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of 
Apache Beam. The portability framework introduces well-defined, 
language-neutral data structures and protocols between the SDK and runner. This 
interop layer – called the `Portability API` – ensures that SDKs and runners 
can work with each other uniformly, reducing the interoperability burden for 
both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs 
and runners for job submission, management and execution. These contracts use 
`protobuf`s and `gRPC` for broad language support. All SDKs currently support 
the portability framework.
+
+The SDK harness is a SDK-provided program responsible for executing user code 
and is run separately from the runner. SDK harness initialization relies on the 
Provision and `Artifact API`s for obtaining staged files, pipeline options and 
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying 
cluster setups:
+
+* **environment_type**: determines where user code will be executed:
+  * **DOCKER**: User code is executed within a container started on each 
worker node. This requires docker to be installed on worker nodes (default). 
Use `environment_config`  to specify the Docker image URL. Official Docker 
images are used by default. Alternatively, you can build your own image. 
Prebuilt SDK container images are released per supported language during Beam 
releases and pushed to Docker Hub.
+  * **PROCESS**: User code is executed by processes that are automatically 
started by the runner on each worker node.
+  * **EXTERNAL**: User code will be dispatched to an external service. Use 
`environment_config` to specify the address for the external service, e.g. 
`localhost:50000`.
+  * **LOOPBACK**: User code is executed within the same process that submitted 
the pipeline.

Review Comment:
   ```suggestion
     * `LOOPBACK`: executes user code within the same process that submitted 
the pipeline.
   ```



##########
learning/prompts/documentation-lookup-nolinks/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is Docker Compose service included into Apache Beam 
SDK versions 2.49.0 and later. You can use Transform service to upgrade or 
downgrade Beam SDK version of individual supported transforms in your pipeline 
without changing the Beam version of the pipeline. This feature is only 
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline 
options specifying the URNs of the transforms you want to upgrade and the Beam 
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1 
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the 
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion` 
specifies the Beam version you want to upgrade to.

Review Comment:
   ```suggestion
   In the provided example, `--transformsToOverride` specifies the URN of the 
transform to upgrade or downgrade, while `--transformServiceBeamVersion` 
specifies the target Beam version.
   ```



##########
learning/prompts/documentation-lookup-nolinks/48_sdk_harness.md:
##########
@@ -0,0 +1,19 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of 
Apache Beam. The portability framework introduces well-defined, 
language-neutral data structures and protocols between the SDK and runner. This 
interop layer – called the `Portability API` – ensures that SDKs and runners 
can work with each other uniformly, reducing the interoperability burden for 
both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs 
and runners for job submission, management and execution. These contracts use 
`protobuf`s and `gRPC` for broad language support. All SDKs currently support 
the portability framework.
+
+The SDK harness is a SDK-provided program responsible for executing user code 
and is run separately from the runner. SDK harness initialization relies on the 
Provision and `Artifact API`s for obtaining staged files, pipeline options and 
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying 
cluster setups:
+
+* **environment_type**: determines where user code will be executed:

Review Comment:
   ```suggestion
   1. **`environment_type`**: determines where user code is executed. The 
`environment_config` parameter configures the environment based on the value of 
`environment_type`:
   ```



##########
learning/prompts/documentation-lookup-nolinks/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is Docker Compose service included into Apache Beam 
SDK versions 2.49.0 and later. You can use Transform service to upgrade or 
downgrade Beam SDK version of individual supported transforms in your pipeline 
without changing the Beam version of the pipeline. This feature is only 
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline 
options specifying the URNs of the transforms you want to upgrade and the Beam 
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1 
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the 
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion` 
specifies the Beam version you want to upgrade to.
+
+The framework will automatically download the specified version of Docker 
containers for the transforms and use them in the pipeline. You must have 
Docker installed on the machine that starts the service.
+
+Currently the following transforms are supported:
+

Review Comment:
   ```suggestion
   ```



##########
learning/prompts/documentation-lookup-nolinks/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is Docker Compose service included into Apache Beam 
SDK versions 2.49.0 and later. You can use Transform service to upgrade or 
downgrade Beam SDK version of individual supported transforms in your pipeline 
without changing the Beam version of the pipeline. This feature is only 
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline 
options specifying the URNs of the transforms you want to upgrade and the Beam 
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1 
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the 
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion` 
specifies the Beam version you want to upgrade to.
+
+The framework will automatically download the specified version of Docker 
containers for the transforms and use them in the pipeline. You must have 
Docker installed on the machine that starts the service.
+
+Currently the following transforms are supported:

Review Comment:
   ```suggestion
   Currently, the Transform service can upgrade the following transforms:
   ```



##########
learning/prompts/documentation-lookup-nolinks/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is Docker Compose service included into Apache Beam 
SDK versions 2.49.0 and later. You can use Transform service to upgrade or 
downgrade Beam SDK version of individual supported transforms in your pipeline 
without changing the Beam version of the pipeline. This feature is only 
supported for Beam Java SDK 2.53.0 and later.

Review Comment:
   ```suggestion
   The Transform service is a Docker Compose service included in Apache Beam 
SDK versions 2.49.0 and later. It enables you to upgrade or downgrade the Beam 
SDK version of individual supported transforms in your pipeline without 
changing the overall Beam version of the pipeline. Additionally, you can 
utilize the Transform service to create and execute multi-language pipelines 
without needing to install support for additional language runtimes.
   ```



##########
learning/prompts/documentation-lookup/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is [Docker Compose 
service](https://docs.docker.com/compose/) included into Apache Beam SDK 
versions 2.49.0 and later. You can use Transform service to upgrade or 
downgrade Beam SDK version of individual supported transforms in your pipeline 
without changing the Beam version of the pipeline. This feature is only 
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline 
options specifying the URNs of the transforms you want to upgrade and the Beam 
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1 
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the 
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion` 
specifies the Beam version you want to upgrade to.
+
+The framework will automatically download the specified version of Docker 
containers for the transforms and use them in the pipeline. You must have 
Docker installed on the machine that starts the service.
+
+Currently the following transforms are supported:
+
+* BigQuery read transform: `beam:transform:org.apache.beam:bigquery_read:v1`
+* BigQuery write transform: `beam:transform:org.apache.beam:bigquery_write:v1`
+* Kafka read transform: 
`beam:transform:org.apache.beam:kafka_read_with_metadata:v2`
+* Kafka write transform: `beam:transform:org.apache.beam:kafka_write:v2`
+
+Transform service implement Beam expansion API. This means you can use the 
Transform service to construct and execute multi-language pipelines. For 
example, you can build a Python pipeline that uses Java `KafkaIO` transform and 
execute in without installing Java locally.
+
+Transform service can be started automatically by Apache Beam SDK or manually 
by users:

Review Comment:
   ```suggestion
   In some cases, Apache Beam SDKs can start the Transform service 
automatically, provided that Docker is available locally. You can also start 
the Transform service manually by running the following command:
   ```



##########
learning/prompts/documentation-lookup-nolinks/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?

Review Comment:
   ```suggestion
   What is Transform service in Apache Beam?
   ```



##########
learning/prompts/documentation-lookup-nolinks/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is Docker Compose service included into Apache Beam 
SDK versions 2.49.0 and later. You can use Transform service to upgrade or 
downgrade Beam SDK version of individual supported transforms in your pipeline 
without changing the Beam version of the pipeline. This feature is only 
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline 
options specifying the URNs of the transforms you want to upgrade and the Beam 
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1 
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the 
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion` 
specifies the Beam version you want to upgrade to.
+
+The framework will automatically download the specified version of Docker 
containers for the transforms and use them in the pipeline. You must have 
Docker installed on the machine that starts the service.
+
+Currently the following transforms are supported:
+
+* BigQuery read transform: `beam:transform:org.apache.beam:bigquery_read:v1`
+* BigQuery write transform: `beam:transform:org.apache.beam:bigquery_write:v1`
+* Kafka read transform: 
`beam:transform:org.apache.beam:kafka_read_with_metadata:v2`
+* Kafka write transform: `beam:transform:org.apache.beam:kafka_write:v2`

Review Comment:
   ```suggestion
   * BigQuery read: `beam:transform:org.apache.beam:bigquery_read:v1`
   * BigQuery write: `beam:transform:org.apache.beam:bigquery_write:v1`
   * Kafka read: `beam:transform:org.apache.beam:kafka_read_with_metadata:v2`
   * Kafka write: `beam:transform:org.apache.beam:kafka_write:v2`
   ```



##########
learning/prompts/documentation-lookup-nolinks/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is Docker Compose service included into Apache Beam 
SDK versions 2.49.0 and later. You can use Transform service to upgrade or 
downgrade Beam SDK version of individual supported transforms in your pipeline 
without changing the Beam version of the pipeline. This feature is only 
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline 
options specifying the URNs of the transforms you want to upgrade and the Beam 
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1 
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the 
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion` 
specifies the Beam version you want to upgrade to.
+
+The framework will automatically download the specified version of Docker 
containers for the transforms and use them in the pipeline. You must have 
Docker installed on the machine that starts the service.
+
+Currently the following transforms are supported:
+
+* BigQuery read transform: `beam:transform:org.apache.beam:bigquery_read:v1`
+* BigQuery write transform: `beam:transform:org.apache.beam:bigquery_write:v1`
+* Kafka read transform: 
`beam:transform:org.apache.beam:kafka_read_with_metadata:v2`
+* Kafka write transform: `beam:transform:org.apache.beam:kafka_write:v2`
+
+Transform service implement Beam expansion API. This means you can use the 
Transform service to construct and execute multi-language pipelines. For 
example, you can build a Python pipeline that uses Java `KafkaIO` transform and 
execute in without installing Java locally.
+
+Transform service can be started automatically by Apache Beam SDK or manually 
by users:

Review Comment:
   ```suggestion
   In some cases, Apache Beam SDKs can start the Transform service 
automatically, provided that Docker is available locally. You can also start 
the Transform service manually by running the following command:
   ```



##########
learning/prompts/documentation-lookup-nolinks/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is Docker Compose service included into Apache Beam 
SDK versions 2.49.0 and later. You can use Transform service to upgrade or 
downgrade Beam SDK version of individual supported transforms in your pipeline 
without changing the Beam version of the pipeline. This feature is only 
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline 
options specifying the URNs of the transforms you want to upgrade and the Beam 
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1 
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the 
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion` 
specifies the Beam version you want to upgrade to.
+
+The framework will automatically download the specified version of Docker 
containers for the transforms and use them in the pipeline. You must have 
Docker installed on the machine that starts the service.

Review Comment:
   ```suggestion
   The framework automatically downloads the specified version of Docker 
containers for the transforms and uses them in the pipeline. You must have 
Docker installed on the machine that starts the service.
   ```



##########
learning/prompts/documentation-lookup/48_sdk_harness.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of 
Apache Beam. The [portability 
framework](https://beam.apache.org/roadmap/portability/) introduces 
well-defined, language-neutral data structures and protocols between the SDK 
and runner. This interop layer – called the `Portability API` – ensures that 
SDKs and runners can work with each other uniformly, reducing the 
interoperability burden for both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs 
and runners for job submission, management and execution. These contracts use 
`protobuf`s and `gRPC` for broad language support.

Review Comment:
   ```suggestion
   The portability API consists of a set of smaller contracts that isolate SDKs 
and runners for job submission, management, and execution. These contracts 
utilize protocols like `protobuf` and `gRPC` to provide broad language support.
   ```



##########
learning/prompts/documentation-lookup/48_sdk_harness.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of 
Apache Beam. The [portability 
framework](https://beam.apache.org/roadmap/portability/) introduces 
well-defined, language-neutral data structures and protocols between the SDK 
and runner. This interop layer – called the `Portability API` – ensures that 
SDKs and runners can work with each other uniformly, reducing the 
interoperability burden for both SDKs and runners to a constant effort.

Review Comment:
   ```suggestion
   Interoperability between Apache Beam SDKs and runners is a key aspect of 
Apache Beam. The [portability 
framework](https://beam.apache.org/roadmap/portability/) introduces 
well-defined, language-neutral data structures and protocols between the SDK 
and runner. This interoperability layer, known as the 'portability API', 
ensures that SDKs and runners can seamlessly work with each other, reducing the 
interoperability burden for both SDKs and runners to a constant effort.
   ```



##########
learning/prompts/documentation-lookup-nolinks/48_sdk_harness.md:
##########
@@ -0,0 +1,19 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of 
Apache Beam. The portability framework introduces well-defined, 
language-neutral data structures and protocols between the SDK and runner. This 
interop layer – called the `Portability API` – ensures that SDKs and runners 
can work with each other uniformly, reducing the interoperability burden for 
both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs 
and runners for job submission, management and execution. These contracts use 
`protobuf`s and `gRPC` for broad language support. All SDKs currently support 
the portability framework.
+
+The SDK harness is a SDK-provided program responsible for executing user code 
and is run separately from the runner. SDK harness initialization relies on the 
Provision and `Artifact API`s for obtaining staged files, pipeline options and 
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying 
cluster setups:
+
+* **environment_type**: determines where user code will be executed:
+  * **DOCKER**: User code is executed within a container started on each 
worker node. This requires docker to be installed on worker nodes (default). 
Use `environment_config`  to specify the Docker image URL. Official Docker 
images are used by default. Alternatively, you can build your own image. 
Prebuilt SDK container images are released per supported language during Beam 
releases and pushed to Docker Hub.
+  * **PROCESS**: User code is executed by processes that are automatically 
started by the runner on each worker node.
+  * **EXTERNAL**: User code will be dispatched to an external service. Use 
`environment_config` to specify the address for the external service, e.g. 
`localhost:50000`.
+  * **LOOPBACK**: User code is executed within the same process that submitted 
the pipeline.
+

Review Comment:
   ```suggestion
   ```



##########
learning/prompts/documentation-lookup-nolinks/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is Docker Compose service included into Apache Beam 
SDK versions 2.49.0 and later. You can use Transform service to upgrade or 
downgrade Beam SDK version of individual supported transforms in your pipeline 
without changing the Beam version of the pipeline. This feature is only 
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline 
options specifying the URNs of the transforms you want to upgrade and the Beam 
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1 
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the 
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion` 
specifies the Beam version you want to upgrade to.
+
+The framework will automatically download the specified version of Docker 
containers for the transforms and use them in the pipeline. You must have 
Docker installed on the machine that starts the service.
+
+Currently the following transforms are supported:
+
+* BigQuery read transform: `beam:transform:org.apache.beam:bigquery_read:v1`
+* BigQuery write transform: `beam:transform:org.apache.beam:bigquery_write:v1`
+* Kafka read transform: 
`beam:transform:org.apache.beam:kafka_read_with_metadata:v2`
+* Kafka write transform: `beam:transform:org.apache.beam:kafka_write:v2`
+
+Transform service implement Beam expansion API. This means you can use the 
Transform service to construct and execute multi-language pipelines. For 
example, you can build a Python pipeline that uses Java `KafkaIO` transform and 
execute in without installing Java locally.
+
+Transform service can be started automatically by Apache Beam SDK or manually 
by users:
+
+```java
+java -jar beam-sdks-java-transform-service-app-<Beam version for the jar>.jar 
--port <port> --beam_version <Beam version for the transform service> 
--project_name <a unique ID for the transform service> --command up
+```
+
+Beam transform service includes a number of transforms implemented in the 
Apache Beam Java and Python SDKs:
+
+* Java transforms: Google Cloud I/O connectors, the Kafka I/O connector, and 
the JDBC I/O connector
+* Python transforms: all portable transforms implemented within the Apache 
Beam Python SDK, such as RunInference and DataFrame transforms.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+

Review Comment:
   ```suggestion
   The Beam Transform service includes several portable transforms implemented 
in the Apache Beam Java and Python SDKs:
   * Java transforms: Google Cloud I/O connectors, the Kafka I/O connector, and 
the JDBC I/O connector.
   * Python transforms: all portable transforms implemented within the Apache 
Beam Python SDK, such as RunInference and DataFrame transforms.
   ```



##########
learning/prompts/documentation-lookup/48_sdk_harness.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of 
Apache Beam. The [portability 
framework](https://beam.apache.org/roadmap/portability/) introduces 
well-defined, language-neutral data structures and protocols between the SDK 
and runner. This interop layer – called the `Portability API` – ensures that 
SDKs and runners can work with each other uniformly, reducing the 
interoperability burden for both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs 
and runners for job submission, management and execution. These contracts use 
`protobuf`s and `gRPC` for broad language support.
+
+All SDKs currently support the portability framework. See the [Portability 
support 
table](https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0)
 for details.
+
+The SDK harness is a SDK-provided program responsible for executing user code 
and is run separately from the runner. SDK harness initialization relies on the 
Provision and `Artifact API`s for obtaining staged files, pipeline options and 
environment information.

Review Comment:
   ```suggestion
   The SDK harness is a program responsible for executing user code. This 
program is provided by an SDK and runs separately from the runner. SDK harness 
initialization relies on the provision and artifact APIs for obtaining staged 
files, pipeline options, and environment information.
   ```



##########
learning/prompts/documentation-lookup/48_sdk_harness.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of 
Apache Beam. The [portability 
framework](https://beam.apache.org/roadmap/portability/) introduces 
well-defined, language-neutral data structures and protocols between the SDK 
and runner. This interop layer – called the `Portability API` – ensures that 
SDKs and runners can work with each other uniformly, reducing the 
interoperability burden for both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs 
and runners for job submission, management and execution. These contracts use 
`protobuf`s and `gRPC` for broad language support.
+
+All SDKs currently support the portability framework. See the [Portability 
support 
table](https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0)
 for details.

Review Comment:
   ```suggestion
   Currently, all SDKs support the portability framework. For the latest 
information on portability support across SDKs, features, and runners, refer to 
the [Apache Beam Portability Support 
Matrix](https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0).
   ```



##########
learning/prompts/documentation-lookup/48_sdk_harness.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of 
Apache Beam. The [portability 
framework](https://beam.apache.org/roadmap/portability/) introduces 
well-defined, language-neutral data structures and protocols between the SDK 
and runner. This interop layer – called the `Portability API` – ensures that 
SDKs and runners can work with each other uniformly, reducing the 
interoperability burden for both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs 
and runners for job submission, management and execution. These contracts use 
`protobuf`s and `gRPC` for broad language support.
+
+All SDKs currently support the portability framework. See the [Portability 
support 
table](https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0)
 for details.
+
+The SDK harness is a SDK-provided program responsible for executing user code 
and is run separately from the runner. SDK harness initialization relies on the 
Provision and `Artifact API`s for obtaining staged files, pipeline options and 
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying 
cluster setups:
+

Review Comment:
   ```suggestion
   ```



##########
learning/prompts/documentation-lookup/48_sdk_harness.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of 
Apache Beam. The [portability 
framework](https://beam.apache.org/roadmap/portability/) introduces 
well-defined, language-neutral data structures and protocols between the SDK 
and runner. This interop layer – called the `Portability API` – ensures that 
SDKs and runners can work with each other uniformly, reducing the 
interoperability burden for both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs 
and runners for job submission, management and execution. These contracts use 
`protobuf`s and `gRPC` for broad language support.
+
+All SDKs currently support the portability framework. See the [Portability 
support 
table](https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0)
 for details.
+
+The SDK harness is a SDK-provided program responsible for executing user code 
and is run separately from the runner. SDK harness initialization relies on the 
Provision and `Artifact API`s for obtaining staged files, pipeline options and 
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying 
cluster setups:
+
+* **environment_type**: determines where user code will be executed:

Review Comment:
   ```suggestion
   1. **`environment_type`**: determines where user code is executed. The 
`environment_config` parameter configures the environment based on the value of 
`environment_type`:
   ```



##########
learning/prompts/documentation-lookup/48_sdk_harness.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of 
Apache Beam. The [portability 
framework](https://beam.apache.org/roadmap/portability/) introduces 
well-defined, language-neutral data structures and protocols between the SDK 
and runner. This interop layer – called the `Portability API` – ensures that 
SDKs and runners can work with each other uniformly, reducing the 
interoperability burden for both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs 
and runners for job submission, management and execution. These contracts use 
`protobuf`s and `gRPC` for broad language support.
+
+All SDKs currently support the portability framework. See the [Portability 
support 
table](https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0)
 for details.
+
+The SDK harness is a SDK-provided program responsible for executing user code 
and is run separately from the runner. SDK harness initialization relies on the 
Provision and `Artifact API`s for obtaining staged files, pipeline options and 
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying 
cluster setups:
+
+* **environment_type**: determines where user code will be executed:
+  * **DOCKER**: User code is executed within a container started on each 
worker node. This requires docker to be installed on worker nodes (default). 
Use `environment_config`  to specify the Docker image URL. Official Docker 
images are used by default. Alternatively, you can build your own image by 
following the instructions 
[here](https://beam.apache.org/documentation/runtime/environments/). Prebuilt 
SDK container images are released per supported language during Beam releases 
and pushed to [Docker 
Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image).
+  * **PROCESS**: User code is executed by processes that are automatically 
started by the runner on each worker node.
+  * **EXTERNAL**: User code will be dispatched to an external service. Use 
`environment_config` to specify the address for the external service, e.g. 
`localhost:50000`.
+  * **LOOPBACK**: User code is executed within the same process that submitted 
the pipeline.

Review Comment:
   ```suggestion
     * `LOOPBACK`: executes user code within the same process that submitted 
the pipeline.
   ```



##########
learning/prompts/documentation-lookup/48_sdk_harness.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of 
Apache Beam. The [portability 
framework](https://beam.apache.org/roadmap/portability/) introduces 
well-defined, language-neutral data structures and protocols between the SDK 
and runner. This interop layer – called the `Portability API` – ensures that 
SDKs and runners can work with each other uniformly, reducing the 
interoperability burden for both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs 
and runners for job submission, management and execution. These contracts use 
`protobuf`s and `gRPC` for broad language support.
+
+All SDKs currently support the portability framework. See the [Portability 
support 
table](https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0)
 for details.
+
+The SDK harness is a SDK-provided program responsible for executing user code 
and is run separately from the runner. SDK harness initialization relies on the 
Provision and `Artifact API`s for obtaining staged files, pipeline options and 
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying 
cluster setups:
+
+* **environment_type**: determines where user code will be executed:
+  * **DOCKER**: User code is executed within a container started on each 
worker node. This requires docker to be installed on worker nodes (default). 
Use `environment_config`  to specify the Docker image URL. Official Docker 
images are used by default. Alternatively, you can build your own image by 
following the instructions 
[here](https://beam.apache.org/documentation/runtime/environments/). Prebuilt 
SDK container images are released per supported language during Beam releases 
and pushed to [Docker 
Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image).
+  * **PROCESS**: User code is executed by processes that are automatically 
started by the runner on each worker node.

Review Comment:
   ```suggestion
     * `PROCESS`: executes user code through processes that are automatically 
initiated by the runner on each worker node.
   ```



##########
learning/prompts/documentation-lookup/48_sdk_harness.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of 
Apache Beam. The [portability 
framework](https://beam.apache.org/roadmap/portability/) introduces 
well-defined, language-neutral data structures and protocols between the SDK 
and runner. This interop layer – called the `Portability API` – ensures that 
SDKs and runners can work with each other uniformly, reducing the 
interoperability burden for both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs 
and runners for job submission, management and execution. These contracts use 
`protobuf`s and `gRPC` for broad language support.
+
+All SDKs currently support the portability framework. See the [Portability 
support 
table](https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0)
 for details.
+
+The SDK harness is a SDK-provided program responsible for executing user code 
and is run separately from the runner. SDK harness initialization relies on the 
Provision and `Artifact API`s for obtaining staged files, pipeline options and 
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying 
cluster setups:
+
+* **environment_type**: determines where user code will be executed:
+  * **DOCKER**: User code is executed within a container started on each 
worker node. This requires docker to be installed on worker nodes (default). 
Use `environment_config`  to specify the Docker image URL. Official Docker 
images are used by default. Alternatively, you can build your own image by 
following the instructions 
[here](https://beam.apache.org/documentation/runtime/environments/). Prebuilt 
SDK container images are released per supported language during Beam releases 
and pushed to [Docker 
Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image).

Review Comment:
   ```suggestion
     * `DOCKER`: executes user code within a container on each worker node. 
Docker must be installed on worker nodes. You can specify the Docker image URL 
using the `environment_config` parameter. Prebuilt SDK container images are 
available with each Apache Beam release and pushed to [Docker 
Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image). You can also 
[build your custom 
image](https://beam.apache.org/documentation/runtime/environments/).
   ```



##########
learning/prompts/documentation-lookup/48_sdk_harness.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of 
Apache Beam. The [portability 
framework](https://beam.apache.org/roadmap/portability/) introduces 
well-defined, language-neutral data structures and protocols between the SDK 
and runner. This interop layer – called the `Portability API` – ensures that 
SDKs and runners can work with each other uniformly, reducing the 
interoperability burden for both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs 
and runners for job submission, management and execution. These contracts use 
`protobuf`s and `gRPC` for broad language support.
+
+All SDKs currently support the portability framework. See the [Portability 
support 
table](https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0)
 for details.
+
+The SDK harness is a SDK-provided program responsible for executing user code 
and is run separately from the runner. SDK harness initialization relies on the 
Provision and `Artifact API`s for obtaining staged files, pipeline options and 
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying 
cluster setups:

Review Comment:
   ```suggestion
   Apache Beam offers configuration options for the SDK harness to cater to 
diverse cluster setups. These options include:
   ```



##########
learning/prompts/documentation-lookup/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?

Review Comment:
   ```suggestion
   What is Transform service in Apache Beam?
   ```



##########
learning/prompts/documentation-lookup/48_sdk_harness.md:
##########
@@ -0,0 +1,21 @@
+Prompt:
+What is SDK Harness in Apache Beam?
+
+Response:
+Interoperability between Apache Beam SDKs and runners is a key aspect of 
Apache Beam. The [portability 
framework](https://beam.apache.org/roadmap/portability/) introduces 
well-defined, language-neutral data structures and protocols between the SDK 
and runner. This interop layer – called the `Portability API` – ensures that 
SDKs and runners can work with each other uniformly, reducing the 
interoperability burden for both SDKs and runners to a constant effort.
+
+The portability API consists of a set of smaller contracts that isolate SDKs 
and runners for job submission, management and execution. These contracts use 
`protobuf`s and `gRPC` for broad language support.
+
+All SDKs currently support the portability framework. See the [Portability 
support 
table](https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0)
 for details.
+
+The SDK harness is a SDK-provided program responsible for executing user code 
and is run separately from the runner. SDK harness initialization relies on the 
Provision and `Artifact API`s for obtaining staged files, pipeline options and 
environment information.
+
+Apache Beam allows configuration of the SDK harness to accommodate varying 
cluster setups:
+
+* **environment_type**: determines where user code will be executed:
+  * **DOCKER**: User code is executed within a container started on each 
worker node. This requires docker to be installed on worker nodes (default). 
Use `environment_config`  to specify the Docker image URL. Official Docker 
images are used by default. Alternatively, you can build your own image by 
following the instructions 
[here](https://beam.apache.org/documentation/runtime/environments/). Prebuilt 
SDK container images are released per supported language during Beam releases 
and pushed to [Docker 
Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image).
+  * **PROCESS**: User code is executed by processes that are automatically 
started by the runner on each worker node.
+  * **EXTERNAL**: User code will be dispatched to an external service. Use 
`environment_config` to specify the address for the external service, e.g. 
`localhost:50000`.

Review Comment:
   ```suggestion
     * `EXTERNAL`: dispatches user code to an external service. Use the 
`environment_config` parameter to specify the service address, for example, 
`localhost:50000`.
   ```



##########
learning/prompts/documentation-lookup/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is [Docker Compose 
service](https://docs.docker.com/compose/) included into Apache Beam SDK 
versions 2.49.0 and later. You can use Transform service to upgrade or 
downgrade Beam SDK version of individual supported transforms in your pipeline 
without changing the Beam version of the pipeline. This feature is only 
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline 
options specifying the URNs of the transforms you want to upgrade and the Beam 
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1 
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the 
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion` 
specifies the Beam version you want to upgrade to.

Review Comment:
   ```suggestion
   In the provided example, `--transformsToOverride` specifies the URN of the 
transform to upgrade or downgrade, while `--transformServiceBeamVersion` 
specifies the target Beam version.
   ```



##########
learning/prompts/documentation-lookup/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is [Docker Compose 
service](https://docs.docker.com/compose/) included into Apache Beam SDK 
versions 2.49.0 and later. You can use Transform service to upgrade or 
downgrade Beam SDK version of individual supported transforms in your pipeline 
without changing the Beam version of the pipeline. This feature is only 
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline 
options specifying the URNs of the transforms you want to upgrade and the Beam 
version you want to upgrade to:

Review Comment:
   ```suggestion
   You can utilize the Transform service to upgrade specific transforms only if 
you are using Beam Java SDK 2.53.0 and later. To employ this feature, execute a 
Java pipeline with additional pipeline options specifying the URNs of the 
transforms you want to upgrade and the desired Apache Beam version:
   ```



##########
learning/prompts/documentation-lookup/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is [Docker Compose 
service](https://docs.docker.com/compose/) included into Apache Beam SDK 
versions 2.49.0 and later. You can use Transform service to upgrade or 
downgrade Beam SDK version of individual supported transforms in your pipeline 
without changing the Beam version of the pipeline. This feature is only 
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline 
options specifying the URNs of the transforms you want to upgrade and the Beam 
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1 
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the 
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion` 
specifies the Beam version you want to upgrade to.
+
+The framework will automatically download the specified version of Docker 
containers for the transforms and use them in the pipeline. You must have 
Docker installed on the machine that starts the service.
+
+Currently the following transforms are supported:
+
+* BigQuery read transform: `beam:transform:org.apache.beam:bigquery_read:v1`
+* BigQuery write transform: `beam:transform:org.apache.beam:bigquery_write:v1`
+* Kafka read transform: 
`beam:transform:org.apache.beam:kafka_read_with_metadata:v2`
+* Kafka write transform: `beam:transform:org.apache.beam:kafka_write:v2`
+
+Transform service implement Beam expansion API. This means you can use the 
Transform service to construct and execute multi-language pipelines. For 
example, you can build a Python pipeline that uses Java `KafkaIO` transform and 
execute in without installing Java locally.
+
+Transform service can be started automatically by Apache Beam SDK or manually 
by users:
+
+```java
+java -jar beam-sdks-java-transform-service-app-<Beam version for the jar>.jar 
--port <port> --beam_version <Beam version for the transform service> 
--project_name <a unique ID for the transform service> --command up
+```
+
+Beam transform service includes a number of transforms implemented in the 
Apache Beam Java and Python SDKs:
+
+* Java transforms: Google Cloud I/O connectors, the Kafka I/O connector, and 
the JDBC I/O connector
+* Python transforms: all portable transforms implemented within the Apache 
Beam Python SDK, such as RunInference and DataFrame transforms.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+

Review Comment:
   ```suggestion
   The Beam Transform service includes several portable transforms implemented 
in the Apache Beam Java and Python SDKs:
   * Java transforms: Google Cloud I/O connectors, the Kafka I/O connector, and 
the JDBC I/O connector.
   * Python transforms: all portable transforms implemented within the Apache 
Beam Python SDK, such as RunInference and DataFrame transforms.
   ```



##########
learning/prompts/documentation-lookup/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is [Docker Compose 
service](https://docs.docker.com/compose/) included into Apache Beam SDK 
versions 2.49.0 and later. You can use Transform service to upgrade or 
downgrade Beam SDK version of individual supported transforms in your pipeline 
without changing the Beam version of the pipeline. This feature is only 
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline 
options specifying the URNs of the transforms you want to upgrade and the Beam 
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1 
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the 
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion` 
specifies the Beam version you want to upgrade to.
+
+The framework will automatically download the specified version of Docker 
containers for the transforms and use them in the pipeline. You must have 
Docker installed on the machine that starts the service.

Review Comment:
   ```suggestion
   The framework automatically downloads the specified version of Docker 
containers for the transforms and uses them in the pipeline. You must have 
Docker installed on the machine that starts the service.
   ```



##########
learning/prompts/documentation-lookup/49_transform_service.md:
##########
@@ -0,0 +1,58 @@
+Prompt:
+What is a Transformer Service in Apache Beam?
+
+Response:
+The `Transform service` is [Docker Compose 
service](https://docs.docker.com/compose/) included into Apache Beam SDK 
versions 2.49.0 and later. You can use Transform service to upgrade or 
downgrade Beam SDK version of individual supported transforms in your pipeline 
without changing the Beam version of the pipeline. This feature is only 
supported for Beam Java SDK 2.53.0 and later.
+
+To use this feature simply execute Java pipeline with additional pipeline 
options specifying the URNs of the transforms you want to upgrade and the Beam 
version you want to upgrade to:
+
+```java
+--transformsToOverride=beam:transform:org.apache.beam:bigquery_read:v1 
--transformServiceBeamVersion=2.xy.z
+```
+
+In the above example, `--transformsToOverride` specifies the URN of the 
transform you want to upgrade or downgrade, and `--transformServiceBeamVersion` 
specifies the Beam version you want to upgrade to.
+
+The framework will automatically download the specified version of Docker 
containers for the transforms and use them in the pipeline. You must have 
Docker installed on the machine that starts the service.
+
+Currently the following transforms are supported:
+
+* BigQuery read transform: `beam:transform:org.apache.beam:bigquery_read:v1`
+* BigQuery write transform: `beam:transform:org.apache.beam:bigquery_write:v1`
+* Kafka read transform: 
`beam:transform:org.apache.beam:kafka_read_with_metadata:v2`
+* Kafka write transform: `beam:transform:org.apache.beam:kafka_write:v2`
+

Review Comment:
   ```suggestion
   * BigQuery read: `beam:transform:org.apache.beam:bigquery_read:v1`
   * BigQuery write: `beam:transform:org.apache.beam:bigquery_write:v1`
   * Kafka read: `beam:transform:org.apache.beam:kafka_read_with_metadata:v2`
   * Kafka write: `beam:transform:org.apache.beam:kafka_write:v2`
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [DRAFT] transform service prompt added [beam]

Reply via email to