rszper commented on code in PR #32412: URL: https://github.com/apache/beam/pull/32412#discussion_r1757630080
########## website/www/site/content/en/blog/unit-testing-in-beam.md: ########## @@ -0,0 +1,198 @@ +--- +title: "Unit Testing in Beam: An opinionated guide" +date: 2024-09-13 00:00:01 -0800 +categories: + - blog +aliases: + - /blog/2024/09/09/unit-testing-in-beam.html +authors: + - svetakvsundhar +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +Testing remains one of the most fundamental components of software engineering. In this blog post, we shed light on some of the constructs that Apache Beam provides to allow for testing. Review Comment: ```suggestion Testing remains one of the most fundamental components of software engineering. In this blog post, we shed light on some of the constructs that Apache Beam provides for testing. ``` ########## website/www/site/content/en/blog/unit-testing-in-beam.md: ########## @@ -0,0 +1,198 @@ +--- +title: "Unit Testing in Beam: An opinionated guide" +date: 2024-09-13 00:00:01 -0800 +categories: + - blog +aliases: + - /blog/2024/09/09/unit-testing-in-beam.html +authors: + - svetakvsundhar +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +Testing remains one of the most fundamental components of software engineering. In this blog post, we shed light on some of the constructs that Apache Beam provides to allow for testing. +We cover an opinionated set of best practices to write unit tests for your data pipeline. This post doesn't include integration tests, and you need to author those separately. +All snippets in this post are included in [this notebook](https://github.com/apache/beam/blob/master/examples/notebooks/blog/unittests_in_beam.ipynb). Additionally, to see tests that exhibit best practices, look at the [Beam starter projects](https://beam.apache.org/blog/beam-starter-projects/), as these contain tests that exhibit best practices. + +## Best practices + +When testing Beam pipelines, we recommend the following best practices: + +1) You don’t need to write any unit tests for the already supported connectors in the Beam Library, such as `ReadFromBigQuery` and `WriteToText`. These connectors are already tested in Beam’s test suite to ensure correct functionality. They add unnecessary cost and dependencies to a unit test. + +2) Ensure that your function is well tested when using it with `Map`, `FlatMap`, or `Filter`. You can assume your function will work as intended when using `Map(your_function)`. +3) For more complex transforms such as `ParDo`’s, side inputs, timestamp inspection, etc., treat the entire transform as a unit, and test it. +4) If needed, use mocking to mock any API calls that might be present in your DoFn. The purpose of mocking is to test your functionality extensively, even if this testing requires a specific response from an API call. + + 1) Be sure to modularize your API calls in separate functions, rather than making the API call directly in the `DoFn`. This step provides a cleaner experience when mocking the external API calls. + + +## Example 1 + +Use the following pipeline as an example. You don't have to write a separate unit test to test this function in the context of this pipeline, assuming the function `median_house_value_per_bedroom` is unit tested elsewhere in the code. You can trust that the `Map` primitive works as expected (this illustrates point #2 noted previously). + +```python +# The following code computes the median house value per bedroom. + +with beam.Pipeline() as p1: + result = ( + p1 + | ReadFromText("/content/sample_data/california_housing_test.csv",skip_header_lines=1) + | beam.Map(median_house_value_per_bedroom) + | WriteToText("/content/example2") + ) +``` + +## Example 2 + +Use the following function as the example. The functions `median_house_value_per_bedroom` and `multiply_by_factor` are tested elsewhere, but the pipeline as a whole, which consists of composite transforms, is not. + +```python +with beam.Pipeline() as p2: + result = ( + p2 + | ReadFromText("/content/sample_data/california_housing_test.csv",skip_header_lines=1) + | beam.Map(median_house_value_per_bedroom) + | beam.Map(multiply_by_factor) + | beam.CombinePerKey(sum) + | WriteToText("/content/example3") + ) +``` + +The best practice for the previous code is to create a transform with all functions between `ReadFromText` and `WriteToText`. This step separates the transformation logic from the I/Os, allowing you to unit-test the transformation logic. The following example is a refactoring of the previous code: Review Comment: ```suggestion The best practice for the previous code is to create a transform with all functions between `ReadFromText` and `WriteToText`. This step separates the transformation logic from the I/Os, allowing you to unit test the transformation logic. The following example is a refactoring of the previous code: ``` ########## website/www/site/content/en/blog/unit-testing-in-beam.md: ########## @@ -0,0 +1,198 @@ +--- +title: "Unit Testing in Beam: An opinionated guide" +date: 2024-09-13 00:00:01 -0800 +categories: + - blog +aliases: + - /blog/2024/09/09/unit-testing-in-beam.html +authors: + - svetakvsundhar +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +Testing remains one of the most fundamental components of software engineering. In this blog post, we shed light on some of the constructs that Apache Beam provides to allow for testing. +We cover an opinionated set of best practices to write unit tests for your data pipeline. This post doesn't include integration tests, and you need to author those separately. +All snippets in this post are included in [this notebook](https://github.com/apache/beam/blob/master/examples/notebooks/blog/unittests_in_beam.ipynb). Additionally, to see tests that exhibit best practices, look at the [Beam starter projects](https://beam.apache.org/blog/beam-starter-projects/), as these contain tests that exhibit best practices. Review Comment: ```suggestion All snippets in this post are included in [this notebook](https://github.com/apache/beam/blob/master/examples/notebooks/blog/unittests_in_beam.ipynb). Additionally, to see tests that exhibit best practices, look at the [Beam starter projects](https://beam.apache.org/blog/beam-starter-projects/), which contain tests that exhibit best practices. ``` ########## website/www/site/content/en/blog/unit-testing-in-beam.md: ########## @@ -0,0 +1,198 @@ +--- +title: "Unit Testing in Beam: An opinionated guide" +date: 2024-09-13 00:00:01 -0800 +categories: + - blog +aliases: + - /blog/2024/09/09/unit-testing-in-beam.html +authors: + - svetakvsundhar +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +Testing remains one of the most fundamental components of software engineering. In this blog post, we shed light on some of the constructs that Apache Beam provides to allow for testing. +We cover an opinionated set of best practices to write unit tests for your data pipeline. This post doesn't include integration tests, and you need to author those separately. +All snippets in this post are included in [this notebook](https://github.com/apache/beam/blob/master/examples/notebooks/blog/unittests_in_beam.ipynb). Additionally, to see tests that exhibit best practices, look at the [Beam starter projects](https://beam.apache.org/blog/beam-starter-projects/), as these contain tests that exhibit best practices. + +## Best practices + +When testing Beam pipelines, we recommend the following best practices: + +1) You don’t need to write any unit tests for the already supported connectors in the Beam Library, such as `ReadFromBigQuery` and `WriteToText`. These connectors are already tested in Beam’s test suite to ensure correct functionality. They add unnecessary cost and dependencies to a unit test. Review Comment: ```suggestion 1) Don't write unit tests for the already supported connectors in the Beam Library, such as `ReadFromBigQuery` and `WriteToText`. These connectors are already tested in Beam’s test suite to ensure correct functionality. They add unnecessary cost and dependencies to a unit test. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
