Hey Beam community, We are glad to announce some new changes that our team has been working on for a while to implement new solutions and enhance the existing ones. The main focus was to improve the Beam infrastructure by increasing test coverage, adding a reporting mechanism, and enhancing the level of code analysis:
* Load and Stress Tests for streaming cases We've laid the foundation for implementing Stress Tests to be used for writing new tests and improving existing ones. Stress Tests were introduced for the following IOs: * BigQueryIO * BigTableIO * KafkaIO * PubSubIO * SpannerIO We've also implemented a new Load Test for PubSubIO. The intention behind the Stress Tests is to measure how a write operation behaves under load and use the results to define potential SLAs for IOs. As a result, we came up with a document describing all the experiments conducted during the implementation, which helped us identify some bugs related to missing records. The document contains a set of prerequisites, links to the PRs and write jobs. For more information about the experiments: https://docs.google.com/document/d/1CVywXz7WwidIMYEp0iAMmQmmMfcExDvkGDK3dOq1bUs/edit?usp=sharing * Training DuetAI for Dataflow We continue to enrich the knowledge base of DuetAI so that it knows even more about Apache Beam: starting from basic questions related to documentation and ending with generating code examples on how to use I/O connectors and explaining to the user what a particular piece of code provided by them does. The knowledge base contains 56 prompt/response pairs for documentation lookup, 11 code generation prompts and 11 code explanation prompts, covering various I/O connectors implemented in Java and Python. See the knowledge base: https://github.com/apache/beam/tree/master/learning/prompts * Beam Flaky Test Detection We've developed a reporting mechanism to notify about flaky test cases when constant failed runs occur. Previously, there were no clear signals on what tests were consistently flaky. Now, the tool monitors the current statistics and creates a GitHub issue with a link to Grafana attached. You may have noticed the open issues with the name "The <job_name> is flaky" in the daily Beam High Priority Issue Report. For more information on how the tool works: https://docs.google.com/document/d/13lwRAWoE7XA2ig0TDt98pI_nVBEQQ2UYqeUPJ0rGnME/edit?usp=sharing * Beam Code Coverage Analysis There were some gaps in Python code coverage and no coverage analysis for Java. As a result, we fixed configuration issues for the Jacoco plugin to generate .xml files, which are used to display statistics in the Codecov report, and adjusted the configuration for Python. For more details: https://docs.google.com/document/d/1186dvd1t774EydPW0T31ynmwYxjqXUo9bh9nCO17rO0/edit?usp=sharing * Beam Playground We've added a Playground CI Nightly check to make sure that Playground examples remain functioning between SDK changes, etc. This will help ensure that the examples are always up-to-date and that users can successfully use them. Taking this opportunity, I would like to thank our team for these changes: * Vlado Djerek<https://github.com/volatilemolotov> (vlado.dje...@akvelon.com<mailto:vlado.dje...@akvelon.com>) * Vitaly Terentyev<https://github.com/Amar3tto> (vitaly.terent...@akvelon.com<mailto:vitaly.terent...@akvelon.com>) * Akarys Shorabek<https://github.com/akashorabek> (akarys.shora...@akvelon.com<mailto:akarys.shora...@akvelon.com>) * Oleg Borisevich<https://github.com/olehborysevych> (oleg.borisev...@akvelon.com<mailto:oleg.borisev...@akvelon.com>) * Daria Bezkorovaina<https://github.com/dariabezkorovaina> (daria.bezkorova...@akvelon.com<mailto:daria.bezkorova...@akvelon.com>) * Danny McCormick<https://github.com/damccorm> (dannymccorm...@google.com<mailto:dannymccorm...@google.com>) * Yi Hu<https://github.com/Abacn> (ya...@google.com<mailto:ya...@google.com>) * XQ Hu<https://github.com/liferoad> (x...@google.com<mailto:x...@google.com>) Feel free to reach out to any of us if you have any questions. Thanks, Andrey