sirenbyte commented on code in PR #26227: URL: https://github.com/apache/beam/pull/26227#discussion_r1185034234
########## learning/tour-of-beam/learning-content/cross-language/multi-pipeline/description.md: ########## @@ -0,0 +1,280 @@ +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +### Multi pipeline + +Apache Beam is a popular open-source platform for building batch and streaming data processing pipelines. One of the key features of Apache Beam is its ability to support multi-language pipelines. With Apache Beam, you can write different parts of your pipeline in different programming languages, and they can all work together seamlessly. + +Apache Beam supports multiple programming languages, including Java, Python, and Go. This makes it possible to use the language that best suits your needs for each part of your pipeline. Review Comment: Done ########## learning/tour-of-beam/learning-content/cross-language/multi-pipeline/description.md: ########## @@ -0,0 +1,280 @@ +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +### Multi pipeline + +Apache Beam is a popular open-source platform for building batch and streaming data processing pipelines. One of the key features of Apache Beam is its ability to support multi-language pipelines. With Apache Beam, you can write different parts of your pipeline in different programming languages, and they can all work together seamlessly. + +Apache Beam supports multiple programming languages, including Java, Python, and Go. This makes it possible to use the language that best suits your needs for each part of your pipeline. + +To build a multi-language pipeline with Apache Beam, you can use the following approach: + +Define your pipeline using Apache Beam's SDK in your preferred programming language. This defines the data processing steps that need to be executed. + +Use Apache Beam's language-specific SDKs to implement the data processing steps in the appropriate programming languages. For example, you could use Java to process some data, Python to process some other data, and Go to perform a specific computation. + +Use Apache Beam's cross-language support to connect the different parts of your pipeline together. Apache Beam provides a common data model and serialization format, so data can be passed seamlessly between different languages. + +By using Apache Beam's multi-language support, you can take advantage of the strengths of different programming languages, while still building a unified data processing pipeline. This can be especially useful when working with large datasets, as different languages may have different performance characteristics for different tasks. + Review Comment: Done ########## learning/tour-of-beam/learning-content/cross-language/multi-pipeline/description.md: ########## @@ -0,0 +1,280 @@ +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +### Multi pipeline + +Apache Beam is a popular open-source platform for building batch and streaming data processing pipelines. One of the key features of Apache Beam is its ability to support multi-language pipelines. With Apache Beam, you can write different parts of your pipeline in different programming languages, and they can all work together seamlessly. + +Apache Beam supports multiple programming languages, including Java, Python, and Go. This makes it possible to use the language that best suits your needs for each part of your pipeline. + +To build a multi-language pipeline with Apache Beam, you can use the following approach: + +Define your pipeline using Apache Beam's SDK in your preferred programming language. This defines the data processing steps that need to be executed. + +Use Apache Beam's language-specific SDKs to implement the data processing steps in the appropriate programming languages. For example, you could use Java to process some data, Python to process some other data, and Go to perform a specific computation. + +Use Apache Beam's cross-language support to connect the different parts of your pipeline together. Apache Beam provides a common data model and serialization format, so data can be passed seamlessly between different languages. + +By using Apache Beam's multi-language support, you can take advantage of the strengths of different programming languages, while still building a unified data processing pipeline. This can be especially useful when working with large datasets, as different languages may have different performance characteristics for different tasks. + +To create a multi-language pipeline in Apache Beam, follow these steps: + +Choose your SDKs: First, decide which programming languages and corresponding SDKs you'd like to use. Apache Beam currently supports Python, Java, and Go SDKs. + +Set up the dependencies: Make sure you have installed the necessary dependencies for each language. For instance, you'll need the Beam Python SDK for Python or the Beam Java SDK for Java. + +Create a pipeline: Using the primary language of your choice, create a pipeline object using the respective SDK. This pipeline will serve as the main entry point for your multi-language pipeline. + +Use cross-language transforms: To execute transforms written in other languages, use the ExternalTransform class (in Python) or the External class (in Java). This allows you to use a transform written in another language as if it were a native transform in your main pipeline. You'll need to provide the appropriate expansion service address for the language of the transform. + +{{if (eq .Sdk "java")}} + +#### Start an expansion service + +When building a job for a multi-language pipeline, Beam uses an expansion service to expand composite transforms. You must have at least one expansion service per remote SDK. Review Comment: Done ########## learning/tour-of-beam/learning-content/cross-language/multi-pipeline/description.md: ########## @@ -0,0 +1,280 @@ +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +### Multi pipeline + +Apache Beam is a popular open-source platform for building batch and streaming data processing pipelines. One of the key features of Apache Beam is its ability to support multi-language pipelines. With Apache Beam, you can write different parts of your pipeline in different programming languages, and they can all work together seamlessly. + +Apache Beam supports multiple programming languages, including Java, Python, and Go. This makes it possible to use the language that best suits your needs for each part of your pipeline. + +To build a multi-language pipeline with Apache Beam, you can use the following approach: + +Define your pipeline using Apache Beam's SDK in your preferred programming language. This defines the data processing steps that need to be executed. + +Use Apache Beam's language-specific SDKs to implement the data processing steps in the appropriate programming languages. For example, you could use Java to process some data, Python to process some other data, and Go to perform a specific computation. + +Use Apache Beam's cross-language support to connect the different parts of your pipeline together. Apache Beam provides a common data model and serialization format, so data can be passed seamlessly between different languages. + +By using Apache Beam's multi-language support, you can take advantage of the strengths of different programming languages, while still building a unified data processing pipeline. This can be especially useful when working with large datasets, as different languages may have different performance characteristics for different tasks. + +To create a multi-language pipeline in Apache Beam, follow these steps: + +Choose your SDKs: First, decide which programming languages and corresponding SDKs you'd like to use. Apache Beam currently supports Python, Java, and Go SDKs. + +Set up the dependencies: Make sure you have installed the necessary dependencies for each language. For instance, you'll need the Beam Python SDK for Python or the Beam Java SDK for Java. + +Create a pipeline: Using the primary language of your choice, create a pipeline object using the respective SDK. This pipeline will serve as the main entry point for your multi-language pipeline. + +Use cross-language transforms: To execute transforms written in other languages, use the ExternalTransform class (in Python) or the External class (in Java). This allows you to use a transform written in another language as if it were a native transform in your main pipeline. You'll need to provide the appropriate expansion service address for the language of the transform. + +{{if (eq .Sdk "java")}} + +#### Start an expansion service + +When building a job for a multi-language pipeline, Beam uses an expansion service to expand composite transforms. You must have at least one expansion service per remote SDK. + +In the general case, if you have a supported version of Python installed on your system, you can let `PythonExternalTransform` handle the details of creating and starting up the expansion service. But if you want to customize the environment or use transforms not available in the default Beam SDK, you might need to run your own expansion service. Review Comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
