rszper commented on code in PR #32735: URL: https://github.com/apache/beam/pull/32735#discussion_r1797280373
########## website/www/site/content/en/blog/beam-yaml-proto.md: ########## @@ -0,0 +1,273 @@ +--- +title: "Efficient Streaming Data Processing with Beam YAML and Protobuf" +date: "2024-09-20T11:53:38+02:00" +categories: + - blog +authors: + - ffernandez92 +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +As streaming data processing grows, so do its maintenance, complexity, and costs. +This post explains how to efficiently scale pipelines by using [Protobuf](https://protobuf.dev/), +which ensures that pipelines are reusable and quick to deploy. The goal is to keep this process simple +for engineers to implement using [Beam YAML](https://beam.apache.org/documentation/sdks/yaml/). + +<!--more--> + +## Simplify pipelines with Beam YAML + +Creating a pipeline in Beam can be somewhat difficult, especially for new Apache Beam users. +Setting up the project, managing dependencies, and so on can be challenging. +By using Beam YAML, you can eliminate most of the boilerplate code, Review Comment: ```suggestion Beam YAML eliminates most of the boilerplate code, ``` ########## website/www/site/content/en/blog/beam-yaml-proto.md: ########## @@ -0,0 +1,273 @@ +--- +title: "Efficient Streaming Data Processing with Beam YAML and Protobuf" +date: "2024-09-20T11:53:38+02:00" +categories: + - blog +authors: + - ffernandez92 +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +As streaming data processing grows, so do its maintenance, complexity, and costs. +This post explains how to efficiently scale pipelines by using [Protobuf](https://protobuf.dev/), +which ensures that pipelines are reusable and quick to deploy. The goal is to keep this process simple +for engineers to implement using [Beam YAML](https://beam.apache.org/documentation/sdks/yaml/). + +<!--more--> + +## Simplify pipelines with Beam YAML + +Creating a pipeline in Beam can be somewhat difficult, especially for new Apache Beam users. +Setting up the project, managing dependencies, and so on can be challenging. +By using Beam YAML, you can eliminate most of the boilerplate code, +which allows you to focus on the most important part of the work: data transformation. + +Some of the key benefits of Beam YAML include: + +* **Readability:** By using a declarative language ([YAML](https://yaml.org/)), the pipeline configuration is more human readable. +* **Reusability:** Reusing the same components across different pipelines is simplified. +* **Maintainability:** Pipeline maintenance and updates are easier. + +The following template shows an example of reading events from a [Kafka](https://kafka.apache.org/intro) topic and +writing them into [BigQuery](https://cloud.google.com/bigquery?hl=en). + +```yaml +pipeline: + transforms: + - type: ReadFromKafka + name: ReadProtoMovieEvents + config: + topic: 'TOPIC_NAME' + format: RAW/AVRO/JSON/PROTO + bootstrap_servers: 'BOOTSTRAP_SERVERS' + schema: 'SCHEMA' + - type: WriteToBigQuery + name: WriteMovieEvents + input: ReadProtoMovieEvents + config: + table: 'PROJECT_ID.DATASET.MOVIE_EVENTS_TABLE' + useAtLeastOnceSemantics: true + +options: + streaming: true + dataflow_service_options: [streaming_mode_at_least_once] +``` + +## The complete workflow + +This section demonstrates the complete workflow for this pipeline. + +### Create a simple proto event Review Comment: ```suggestion ### Create a simple proto event The following code creates a simple movie event. ``` ########## website/www/site/content/en/blog/beam-yaml-proto.md: ########## @@ -0,0 +1,273 @@ +--- +title: "Efficient Streaming Data Processing with Beam YAML and Protobuf" +date: "2024-09-20T11:53:38+02:00" +categories: + - blog +authors: + - ffernandez92 +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +As streaming data processing grows, so do its maintenance, complexity, and costs. Review Comment: ```suggestion # Efficient Streaming Data Processing with Beam YAML and Protobuf As streaming data processing grows, so do its maintenance, complexity, and costs. ``` ########## website/www/site/content/en/blog/beam-yaml-proto.md: ########## @@ -0,0 +1,273 @@ +--- +title: "Efficient Streaming Data Processing with Beam YAML and Protobuf" +date: "2024-09-20T11:53:38+02:00" +categories: + - blog +authors: + - ffernandez92 +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +As streaming data processing grows, so do its maintenance, complexity, and costs. Review Comment: I think we need to add the title here as an H1. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
