chamikaramj commented on a change in pull request #16587:
URL: https://github.com/apache/beam/pull/16587#discussion_r790094005
##########
File path: website/www/site/content/en/documentation/programming-guide.md
##########
@@ -6601,31 +6602,31 @@ public class JavaDataGenerator extends
PTransform<PBegin, PCollection<String>> {
. . .
}
- // Following method conforms to the Requirement 2
+ // The following method conforms to requirement 2.
public JavaDataGenerator withJavaDataGeneratorConfig(JavaDataGeneratorConfig
dataConfig) {
return new JavaDataGenerator(this.size, javaDataGeneratorConfig);
}
. . .
}
-{{< /highlight >}}
+```
-To use a Java class that conforms to the above requirement from a Python SDK
pipeline you may do the following.
+For a complete example, see
[JavaDataGenerator](https://github.com/apache/beam/blob/master/examples/multi-language/src/main/java/org/apache/beam/examples/multilanguage/JavaDataGenerator.java).
-* Step 1: create an allowlist file in the _yaml_ format that describes the
Java transform classes and methods that will be directly accessed from Python.
-* Step 2: start an Expansion Service with the `javaClassLookupAllowlistFile`
option passing path to the allowlist defined in Step 1 as the value.
-* Step 3: Use the Python
[JavaExternalTransform](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/external.py)
API to directly
- access Java transforms defined in the allowlist from the Python side.
+To use a Java class that conforms to the above requirements from a Python SDK
pipeline, follow these steps:
-Starting with Beam 2.35.0, Step 1 and 2 may be skipped as described in
corresponding sections below.
+1. Create a _yaml_ allowlist that describes the Java transform classes and
methods that will be directly accessed from Python.
+2. Start an expansion service, using the `javaClassLookupAllowlistFile` option
to pass the path to the allowlist.
+3. Use the Python
[JavaExternalTransform](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/external.py)
API to directly access Java transforms defined in the allowlist from the
Python side.
Review comment:
Please remove extra spaces in "to ... directly".
##########
File path: website/www/site/content/en/documentation/programming-guide.md
##########
@@ -6601,31 +6602,31 @@ public class JavaDataGenerator extends
PTransform<PBegin, PCollection<String>> {
. . .
}
- // Following method conforms to the Requirement 2
+ // The following method conforms to requirement 2.
public JavaDataGenerator withJavaDataGeneratorConfig(JavaDataGeneratorConfig
dataConfig) {
return new JavaDataGenerator(this.size, javaDataGeneratorConfig);
}
. . .
}
-{{< /highlight >}}
+```
-To use a Java class that conforms to the above requirement from a Python SDK
pipeline you may do the following.
+For a complete example, see
[JavaDataGenerator](https://github.com/apache/beam/blob/master/examples/multi-language/src/main/java/org/apache/beam/examples/multilanguage/JavaDataGenerator.java).
-* Step 1: create an allowlist file in the _yaml_ format that describes the
Java transform classes and methods that will be directly accessed from Python.
-* Step 2: start an Expansion Service with the `javaClassLookupAllowlistFile`
option passing path to the allowlist defined in Step 1 as the value.
-* Step 3: Use the Python
[JavaExternalTransform](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/external.py)
API to directly
- access Java transforms defined in the allowlist from the Python side.
+To use a Java class that conforms to the above requirements from a Python SDK
pipeline, follow these steps:
-Starting with Beam 2.35.0, Step 1 and 2 may be skipped as described in
corresponding sections below.
+1. Create a _yaml_ allowlist that describes the Java transform classes and
methods that will be directly accessed from Python.
+2. Start an expansion service, using the `javaClassLookupAllowlistFile` option
to pass the path to the allowlist.
+3. Use the Python
[JavaExternalTransform](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/external.py)
API to directly access Java transforms defined in the allowlist from the
Python side.
-##### Step 1
+Starting with Beam 2.35.0, steps 1 and 2 can be skipped, as described in the
corresponding sections below.
-To use this Java transform from Python, you may define an allowlist file in
the _yaml_ format. This allowlist lists the class names,
+**Step 1**
+
+To use an eligible Java transform from Python, define a _yaml_ allowlist. This
allowlist lists the class names,
constructor methods, and builder methods that are directly available to be
used from the Python side.
-Starting with Beam 2.35.0, you have the option to specify `*` to the
`javaClassLookupAllowlistFile` option instead of defining an actual allowlist
which
-denotes that all supported transforms in the classpath of the expansion
service may be accessed through the API.
+Starting with Beam 2.35.0, you have the option to pass `*` to the
`javaClassLookupAllowlistFile` option instead of defining an actual allowlist.
The `*` specifies that all supported transforms in the classpath of the
expansion service can be accessed through the API.
Review comment:
Also add: "We encourage using an actual allowlist for production usage
since allowing clients to access arbitrary Java classes can pose a security
risk."
##########
File path: website/www/site/content/en/documentation/programming-guide.md
##########
@@ -6568,28 +6569,28 @@ In this section, we will use
[KafkaIO.Read](https://beam.apache.org/releases/jav
There are two ways to make Java transforms available to other SDKs.
* Option 1: In some cases, you can use existing Java transforms from other
SDKs without writing any additional Java code.
-* Option 2: You can use arbitrary Java Transforms from other SDKs by adding a
few Java classes.
+* Option 2: You can use arbitrary Java transforms from other SDKs by adding a
few Java classes.
-##### 13.1.1.1 Using Existing Java Transforms from Other SDKs Without Writing
more Java Code
+##### 13.1.1.1 Using existing Java transforms
Review comment:
I think this is a bit ambiguous since we use existing Java transforms in
both cases. It's just that for the first case, there's not need to write more
Java code.
##########
File path: website/www/site/content/en/documentation/programming-guide.md
##########
@@ -6601,31 +6602,31 @@ public class JavaDataGenerator extends
PTransform<PBegin, PCollection<String>> {
. . .
}
- // Following method conforms to the Requirement 2
+ // The following method conforms to requirement 2.
public JavaDataGenerator withJavaDataGeneratorConfig(JavaDataGeneratorConfig
dataConfig) {
return new JavaDataGenerator(this.size, javaDataGeneratorConfig);
}
. . .
}
-{{< /highlight >}}
+```
-To use a Java class that conforms to the above requirement from a Python SDK
pipeline you may do the following.
+For a complete example, see
[JavaDataGenerator](https://github.com/apache/beam/blob/master/examples/multi-language/src/main/java/org/apache/beam/examples/multilanguage/JavaDataGenerator.java).
-* Step 1: create an allowlist file in the _yaml_ format that describes the
Java transform classes and methods that will be directly accessed from Python.
-* Step 2: start an Expansion Service with the `javaClassLookupAllowlistFile`
option passing path to the allowlist defined in Step 1 as the value.
-* Step 3: Use the Python
[JavaExternalTransform](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/external.py)
API to directly
- access Java transforms defined in the allowlist from the Python side.
+To use a Java class that conforms to the above requirements from a Python SDK
pipeline, follow these steps:
-Starting with Beam 2.35.0, Step 1 and 2 may be skipped as described in
corresponding sections below.
+1. Create a _yaml_ allowlist that describes the Java transform classes and
methods that will be directly accessed from Python.
+2. Start an expansion service, using the `javaClassLookupAllowlistFile` option
to pass the path to the allowlist.
+3. Use the Python
[JavaExternalTransform](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/external.py)
API to directly access Java transforms defined in the allowlist from the
Python side.
-##### Step 1
+Starting with Beam 2.35.0, steps 1 and 2 can be skipped, as described in the
corresponding sections below.
-To use this Java transform from Python, you may define an allowlist file in
the _yaml_ format. This allowlist lists the class names,
+**Step 1**
+
+To use an eligible Java transform from Python, define a _yaml_ allowlist. This
allowlist lists the class names,
constructor methods, and builder methods that are directly available to be
used from the Python side.
-Starting with Beam 2.35.0, you have the option to specify `*` to the
`javaClassLookupAllowlistFile` option instead of defining an actual allowlist
which
-denotes that all supported transforms in the classpath of the expansion
service may be accessed through the API.
+Starting with Beam 2.35.0, you have the option to pass `*` to the
`javaClassLookupAllowlistFile` option instead of defining an actual allowlist.
The `*` specifies that all supported transforms in the classpath of the
expansion service can be accessed through the API.
Review comment:
Please change to 2.36.0.
##########
File path: website/www/site/content/en/documentation/programming-guide.md
##########
@@ -6601,31 +6602,31 @@ public class JavaDataGenerator extends
PTransform<PBegin, PCollection<String>> {
. . .
}
- // Following method conforms to the Requirement 2
+ // The following method conforms to requirement 2.
public JavaDataGenerator withJavaDataGeneratorConfig(JavaDataGeneratorConfig
dataConfig) {
return new JavaDataGenerator(this.size, javaDataGeneratorConfig);
}
. . .
}
-{{< /highlight >}}
+```
-To use a Java class that conforms to the above requirement from a Python SDK
pipeline you may do the following.
+For a complete example, see
[JavaDataGenerator](https://github.com/apache/beam/blob/master/examples/multi-language/src/main/java/org/apache/beam/examples/multilanguage/JavaDataGenerator.java).
-* Step 1: create an allowlist file in the _yaml_ format that describes the
Java transform classes and methods that will be directly accessed from Python.
-* Step 2: start an Expansion Service with the `javaClassLookupAllowlistFile`
option passing path to the allowlist defined in Step 1 as the value.
-* Step 3: Use the Python
[JavaExternalTransform](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/external.py)
API to directly
- access Java transforms defined in the allowlist from the Python side.
+To use a Java class that conforms to the above requirements from a Python SDK
pipeline, follow these steps:
-Starting with Beam 2.35.0, Step 1 and 2 may be skipped as described in
corresponding sections below.
+1. Create a _yaml_ allowlist that describes the Java transform classes and
methods that will be directly accessed from Python.
+2. Start an expansion service, using the `javaClassLookupAllowlistFile` option
to pass the path to the allowlist.
+3. Use the Python
[JavaExternalTransform](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/external.py)
API to directly access Java transforms defined in the allowlist from the
Python side.
-##### Step 1
+Starting with Beam 2.35.0, steps 1 and 2 can be skipped, as described in the
corresponding sections below.
Review comment:
Please change to Beam 2.36.0 (this was reverted from 2.35.0).
##########
File path: website/www/site/content/en/documentation/programming-guide.md
##########
@@ -6543,23 +6543,24 @@ This is not supported yet, see BEAM-10976.
## 13. Multi-language pipelines {#multi-language-pipelines}
-This section provides comprehensive documentation of multi-language pipelines.
For a short overview of the topic, see:
+This section provides comprehensive documentation of multi-language pipelines.
To get started creating a multi-language pipeline, see:
* [Python multi-language pipelines
quickstart](/documentation/sdks/python-multi-language-pipelines)
+* [Java multi-language pipelines
quickstart](/documentation/sdks/java-multi-language-pipelines)
-Beam allows you to combine transforms written in any supported SDK language
(currently, Java and Python) and use them in one multi-language pipeline. This
capability makes it easy to provide new functionality simultaneously in
different Apache Beam SDKs through a single cross-language transform. For
example, the [Apache Kafka
connector](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/kafka.py)
and [SQL
transform](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/sql.py)
from the Java SDK can be used in Python streaming pipelines.
+Beam lets you combine transforms written in any supported SDK language
(currently, Java and Python) and use them in one multi-language pipeline. This
capability makes it easy to provide new functionality simultaneously in
different Apache Beam SDKs through a single cross-language transform. For
example, the [Apache Kafka
connector](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/kafka.py)
and [SQL
transform](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/sql.py)
from the Java SDK can be used in Python streaming pipelines.
Review comment:
"... used in Python pipelines"
(SQL works for both batch and streaming)
##########
File path: website/www/site/content/en/documentation/programming-guide.md
##########
@@ -6568,28 +6569,28 @@ In this section, we will use
[KafkaIO.Read](https://beam.apache.org/releases/jav
There are two ways to make Java transforms available to other SDKs.
* Option 1: In some cases, you can use existing Java transforms from other
SDKs without writing any additional Java code.
-* Option 2: You can use arbitrary Java Transforms from other SDKs by adding a
few Java classes.
+* Option 2: You can use arbitrary Java transforms from other SDKs by adding a
few Java classes.
-##### 13.1.1.1 Using Existing Java Transforms from Other SDKs Without Writing
more Java Code
+##### 13.1.1.1 Using existing Java transforms
-Starting with Beam 2.34.0, Python SDK users can use some Java transforms
without writing additional Java code. This can be useful in many cases. For
example,
-* A developer not familiar with Java may need to use an existing Java
transform from a Python pipeline
-* A developer may need to make an existing Java transform available to a
Python pipeline without writing/releasing more Java code
+Starting with Beam 2.34.0, Python SDK users can use some Java transforms
without writing additional Java code. This can be useful in many cases. For
example:
+* A developer not familiar with Java may need to use an existing Java
transform from a Python pipeline.
+* A developer may need to make an existing Java transform available to a
Python pipeline without writing/releasing more Java code.
> **Note:** This feature is currently only available when using Java
> transforms from a Python pipeline.
-To be eligible for direct usage, the API of the Java transform has to follow
the following pattern.
-* Requirement 1: The Java transform can be constructed using an available
public constructor or a public static method (a constructor method) in the same
Java class.
-* Requirement 2: The Java transform can be configured using one or more
builder methods. Each builder method should be public and should return an
instance of the Java transform.
+To be eligible for direct usage, the API of the Java transform has to meet the
following requirements:
+1. The Java transform can be constructed using an available public constructor
or a public static method (a constructor method) in the same Java class.
+2. The Java transform can be configured using one or more builder methods.
Each builder method should be public and should return an instance of the Java
transform.
-See below for the structure of an example Java class that can be directly used
from the Python API.
+Here's an example Java class that can be directly used from the Python API.
-{{< highlight >}}
+```java
public class JavaDataGenerator extends PTransform<PBegin, PCollection<String>>
{
. . .
- // Following method satisfies the Requirement 1.
- // Note that you may also use a class constructor instead of a static method.
+ // The following method satisfies requirement 1.
Review comment:
Optional suggestion: Is there a way to inject code from the committed
version of this class [1] instead of copying code here ?
(also applies to other code examples that is available in the repo now)
[1]
https://github.com/apache/beam/blob/master/examples/multi-language/src/main/java/org/apache/beam/examples/multilanguage/JavaDataGenerator.java
##########
File path: website/www/site/content/en/documentation/programming-guide.md
##########
@@ -6637,118 +6638,120 @@ allowedClasses:
- withJavaDataGeneratorConfig
{{< /highlight >}}
-##### Step 2
+**Step 2**
-The allowlist is provided as an argument when starting up the Java expansion
service. For example, the expansion service can be started
-as a local Java process using the following command.
+Provide the allowlist as an argument when starting up the Java expansion
service. For example, you can start the expansion service
+as a local Java process using the following command:
{{< highlight >}}
java -jar <jar file> <port> --javaClassLookupAllowlistFile=<path to the
allowlist file>
{{< /highlight >}}
-Starting with Beam 2.35.0, Beam ``JavaExternalTransform` API will
automatically startup an expansion service with a given set of `jar` file
dependencies
-if an expansion service address was not provided.
+Starting with Beam 2.35.0, the `JavaExternalTransform` API will automatically
start up an expansion service with a given set of `jar` file dependencies if an
expansion service address was not provided.
-##### Step 3
+**Step 3**
-You can directly use the Java class from your Python pipeline using a stub
transform created using the `JavaExternalTransform` API. This API allows you to
construct the transform
-using the Java class name and allows you to invoke builder methods to
configure the class.
+You can use the Java class directly from your Python pipeline using a stub
transform created from the `JavaExternalTransform` API. This API allows you to
construct the transform using the Java class name and allows you to invoke
builder methods to configure the class.
-Constructor and method parameter types are mapped between Python and Java
using a Beam Schema. The Schema is auto-generated using the object types
-provided on the Python side. If the Java class constructor method or builder
method accepts any complex object types, make sure that the Beam Schema
+Constructor and method parameter types are mapped between Python and Java
using a Beam schema. The schema is auto-generated using the object types
+provided on the Python side. If the Java class constructor method or builder
method accepts any complex object types, make sure that the Beam schema
for these objects is registered and available for the Java expansion service.
If a schema has not been registered, the Java expansion service will
-try to register a schema using
[JavaFieldSchema](https://beam.apache.org/documentation/programming-guide/#creating-schemas).
In Python arbitrary objects
-can be represented using `NamedTuple`s which will be represented as Beam Rows
in the Schema. See below for a Python stub transform that represents the above
-mentioned Java transform.
+try to register a schema using
[JavaFieldSchema](https://beam.apache.org/documentation/programming-guide/#creating-schemas).
In Python, arbitrary objects
+can be represented using `NamedTuple`s, which will be represented as Beam rows
in the schema. Here is a Python stub transform that represents the above
+mentioned Java transform:
-{{< highlight >}}
+```py
JavaDataGeneratorConfig = typing.NamedTuple(
'JavaDataGeneratorConfig', [('prefix', str), ('length', int), ('suffix', str)])
data_config = JavaDataGeneratorConfig(prefix='start', length=20, suffix='end')
java_transform = JavaExternalTransform(
'my.beam.transforms.JavaDataGenerator',
expansion_service='localhost:<port>').create(numpy.int32(100)).withJavaDataGeneratorConfig(data_config)
-{{< /highlight >}}
+```
-This transform can be used in a Python pipeline along with other Python
transforms.
+You can use this transform in a Python pipeline along with other Python
transforms. For a complete example, see
[javadatagenerator.py](https://github.com/apache/beam/blob/master/examples/multi-language/python/javadatagenerator.py).
-##### 13.1.1.2 Full API for Making Existing Java Transforms Available to Other
SDKs
+##### 13.1.1.2 Making existing Java transforms available to other SDKs
Review comment:
Ditto regarding ambiguity. In both cases (13.1.1.1 and 13.1.1.2) we make
existing Java transforms available to other SDKs. The difference is, in
13.1.1.2, users need to write additional Java code just to make such transforms
available to other SDKs.
##########
File path: website/www/site/content/en/documentation/programming-guide.md
##########
@@ -6637,118 +6638,120 @@ allowedClasses:
- withJavaDataGeneratorConfig
{{< /highlight >}}
-##### Step 2
+**Step 2**
-The allowlist is provided as an argument when starting up the Java expansion
service. For example, the expansion service can be started
-as a local Java process using the following command.
+Provide the allowlist as an argument when starting up the Java expansion
service. For example, you can start the expansion service
+as a local Java process using the following command:
{{< highlight >}}
java -jar <jar file> <port> --javaClassLookupAllowlistFile=<path to the
allowlist file>
{{< /highlight >}}
-Starting with Beam 2.35.0, Beam ``JavaExternalTransform` API will
automatically startup an expansion service with a given set of `jar` file
dependencies
-if an expansion service address was not provided.
+Starting with Beam 2.35.0, the `JavaExternalTransform` API will automatically
start up an expansion service with a given set of `jar` file dependencies if an
expansion service address was not provided.
Review comment:
2.36.0
##########
File path: website/www/site/content/en/documentation/programming-guide.md
##########
@@ -6637,118 +6638,120 @@ allowedClasses:
- withJavaDataGeneratorConfig
{{< /highlight >}}
-##### Step 2
+**Step 2**
-The allowlist is provided as an argument when starting up the Java expansion
service. For example, the expansion service can be started
-as a local Java process using the following command.
+Provide the allowlist as an argument when starting up the Java expansion
service. For example, you can start the expansion service
+as a local Java process using the following command:
{{< highlight >}}
java -jar <jar file> <port> --javaClassLookupAllowlistFile=<path to the
allowlist file>
{{< /highlight >}}
-Starting with Beam 2.35.0, Beam ``JavaExternalTransform` API will
automatically startup an expansion service with a given set of `jar` file
dependencies
-if an expansion service address was not provided.
+Starting with Beam 2.35.0, the `JavaExternalTransform` API will automatically
start up an expansion service with a given set of `jar` file dependencies if an
expansion service address was not provided.
Review comment:
"with a given \`jar\` file dependency"
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]