ihji commented on code in PR #23837: URL: https://github.com/apache/beam/pull/23837#discussion_r1006080365
########## website/www/site/content/en/documentation/programming-guide.md: ########## @@ -7282,7 +7282,16 @@ To create an SDK wrapper for use in a Python pipeline, do the following: #### 13.1.2. Creating cross-language Python transforms -To make your Python transform usable with different SDK languages, you must create a Python module that registers an existing Python transform as a cross-language transform for use with the Python expansion service and calls into that existing transform to perform its intended operation. +Any Python transforms defined in the scope of the expansion service should be accessible by specifying their fully qualified names. For example, you could use Python's `ReadFromText` transform in a Java pipeline with its fully qualified name `apache_beam.io.ReadFromText`: + +```java Review Comment: I think it deserves to be a separate PR. ########## website/www/site/content/en/documentation/programming-guide.md: ########## @@ -7393,7 +7402,29 @@ Depending on the SDK language of the pipeline, you can use a high-level SDK-wrap #### 13.2.1. Using cross-language transforms in a Java pipeline -Currently, to access cross-language transforms from the Java SDK, you have to use the lower-level [External](https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/External.java) class. +Users have three options to use cross-language transforms in a Java pipeline. At the highest level of abstraction, some popular Python transforms are accessible through dedicated Java wrapper transforms. For example, the Java SDK has the `DataframeTransform` class, which uses the Python SDK's `DataframeTransform`, and it has the `RunInference` class, which uses the Python SDK's `RunInference`, and so on. When an SDK-specific wrapper transform is not available for a target Python transform, you can use the lower-level [PythonExternalTransform](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/PythonExternalTransform.java) class instead by specifying the fully qualified name of the Python transform. If you want to try external transforms from SDKs other than Python, you can also use the lowest-level [External](https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/ runners/core/construction/External.java) class. Review Comment: Users should use `External.of` with raw `ExpansionRequest` proto for expanding external transforms other than Python. We don't have enough utility functions for such use cases. While it's possible to use arbitrary transforms from any SDKs, I think we don't need to emphasize it in the doc by creating a separate subsection. ########## website/www/site/content/en/documentation/programming-guide.md: ########## @@ -7282,7 +7282,16 @@ To create an SDK wrapper for use in a Python pipeline, do the following: #### 13.1.2. Creating cross-language Python transforms -To make your Python transform usable with different SDK languages, you must create a Python module that registers an existing Python transform as a cross-language transform for use with the Python expansion service and calls into that existing transform to perform its intended operation. +Any Python transforms defined in the scope of the expansion service should be accessible by specifying their fully qualified names. For example, you could use Python's `ReadFromText` transform in a Java pipeline with its fully qualified name `apache_beam.io.ReadFromText`: + +```java +p.apply("Read", + PythonExternalTransform.<PBegin, PCollection<String>>from("apache_beam.io.ReadFromText") + .withKwarg("file_pattern", options.getInputFile()) + .withKwarg("validate", false)) +``` + +Alternatively, you may want to create a Python module that registers an existing Python transform as a cross-language transform for use with the Python expansion service and calls into that existing transform to perform its intended operation. A registered URN can be used later in an expansion request for indicating an expansion target. Review Comment: The following sections already provide step-by-step examples for registering existing transforms. This sentence just introduces what to come next. ########## website/www/site/content/en/documentation/programming-guide.md: ########## @@ -7282,7 +7282,16 @@ To create an SDK wrapper for use in a Python pipeline, do the following: #### 13.1.2. Creating cross-language Python transforms -To make your Python transform usable with different SDK languages, you must create a Python module that registers an existing Python transform as a cross-language transform for use with the Python expansion service and calls into that existing transform to perform its intended operation. +Any Python transforms defined in the scope of the expansion service should be accessible by specifying their fully qualified names. For example, you could use Python's `ReadFromText` transform in a Java pipeline with its fully qualified name `apache_beam.io.ReadFromText`: Review Comment: You don't need those methods in every transform since we have a proxy class here for looking and wiring the fully qualified transform classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
