chamikaramj commented on code in PR #23837:
URL: https://github.com/apache/beam/pull/23837#discussion_r1006030776
##########
website/www/site/content/en/documentation/programming-guide.md:
##########
@@ -7282,7 +7282,16 @@ To create an SDK wrapper for use in a Python pipeline,
do the following:
#### 13.1.2. Creating cross-language Python transforms
-To make your Python transform usable with different SDK languages, you must
create a Python module that registers an existing Python transform as a
cross-language transform for use with the Python expansion service and calls
into that existing transform to perform its intended operation.
+Any Python transforms defined in the scope of the expansion service should be
accessible by specifying their fully qualified names. For example, you could
use Python's `ReadFromText` transform in a Java pipeline with its fully
qualified name `apache_beam.io.ReadFromText`:
+
+```java
+p.apply("Read",
+ PythonExternalTransform.<PBegin,
PCollection<String>>from("apache_beam.io.ReadFromText")
+ .withKwarg("file_pattern", options.getInputFile())
+ .withKwarg("validate", false))
+```
+
+Alternatively, you may want to create a Python module that registers an
existing Python transform as a cross-language transform for use with the Python
expansion service and calls into that existing transform to perform its
intended operation. A registered URN can be used later in an expansion request
for indicating an expansion target.
Review Comment:
Should we provide an example snippet for this as well ?
##########
website/www/site/content/en/documentation/programming-guide.md:
##########
@@ -7282,7 +7282,16 @@ To create an SDK wrapper for use in a Python pipeline,
do the following:
#### 13.1.2. Creating cross-language Python transforms
-To make your Python transform usable with different SDK languages, you must
create a Python module that registers an existing Python transform as a
cross-language transform for use with the Python expansion service and calls
into that existing transform to perform its intended operation.
+Any Python transforms defined in the scope of the expansion service should be
accessible by specifying their fully qualified names. For example, you could
use Python's `ReadFromText` transform in a Java pipeline with its fully
qualified name `apache_beam.io.ReadFromText`:
Review Comment:
This still requires "from_runner_api_proto" " "to_runner_api_proto" methods
to be available in the Python transform, correct ?
If so we should note that here.
##########
website/www/site/content/en/documentation/programming-guide.md:
##########
@@ -7393,7 +7402,29 @@ Depending on the SDK language of the pipeline, you can
use a high-level SDK-wrap
#### 13.2.1. Using cross-language transforms in a Java pipeline
-Currently, to access cross-language transforms from the Java SDK, you have to
use the lower-level
[External](https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/External.java)
class.
+Users have three options to use cross-language transforms in a Java pipeline.
At the highest level of abstraction, some popular Python transforms are
accessible through dedicated Java wrapper transforms. For example, the Java SDK
has the `DataframeTransform` class, which uses the Python SDK's
`DataframeTransform`, and it has the `RunInference` class, which uses the
Python SDK's `RunInference`, and so on. When an SDK-specific wrapper transform
is not available for a target Python transform, you can use the lower-level
[PythonExternalTransform](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/PythonExternalTransform.java)
class instead by specifying the fully qualified name of the Python transform.
If you want to try external transforms from SDKs other than Python, you can
also use the lowest-level
[External](https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/
runners/core/construction/External.java) class.
Review Comment:
We should try to make this less language combination specific (Python from
Java) if possible. Probably structure it as:
\#\#\#\#\# Simplified APIs for using Python transforms from Java
\#\#\#\#\# Generic API for using arbitrary transforms from Java
Also, please mention that latter applies for Java-on-Java as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]