Re: Beam Infrastructure - Highlights of Recent Changes

Robert Bradshaw via dev Thu, 13 Jun 2024 11:38:08 -0700

Thanks for the update. Lots of great stuff here!


On Thu, Jun 13, 2024 at 8:27 AM Andrey Devyatkin via dev
<dev@beam.apache.org> wrote:
>
> Hey Beam community,
>
> We are glad to announce some new changes that our team has been working on 
> for a while to implement new solutions and enhance the existing ones. The 
> main focus was to improve the Beam infrastructure by increasing test 
> coverage, adding a reporting mechanism, and enhancing the level of code 
> analysis:
>
> Load and Stress Tests for streaming cases
> We've laid the foundation for implementing Stress Tests to be used for 
> writing new tests and improving existing ones. Stress Tests were introduced 
> for the following IOs:
>
> BigQueryIO
> BigTableIO
> KafkaIO
> PubSubIO
> SpannerIO
>
> We've also implemented a new Load Test for PubSubIO.
> The intention behind the Stress Tests is to measure how a write operation 
> behaves under load and use the results to define potential SLAs for IOs. As a 
> result, we came up with a document describing all the experiments conducted 
> during the implementation, which helped us identify some bugs related to 
> missing records. The document contains a set of prerequisites, links to the 
> PRs and write jobs.
> For more information about the experiments: 
> https://docs.google.com/document/d/1CVywXz7WwidIMYEp0iAMmQmmMfcExDvkGDK3dOq1bUs/edit?usp=sharing
>
> Training DuetAI for Dataflow
> We continue to enrich the knowledge base of DuetAI so that it knows even more 
> about Apache Beam: starting from basic questions related to documentation and 
> ending with generating code examples on how to use I/O connectors and 
> explaining to the user what a particular piece of code provided by them does. 
> The knowledge base contains 56 prompt/response pairs for documentation 
> lookup, 11 code generation prompts and 11 code explanation prompts, covering 
> various I/O connectors implemented in Java and Python.
> See the knowledge base: 
> https://github.com/apache/beam/tree/master/learning/prompts
> Beam Flaky Test Detection
> We've developed a reporting mechanism to notify about flaky test cases when 
> constant failed runs occur. Previously, there were no clear signals on what 
> tests were consistently flaky. Now, the tool monitors the current statistics 
> and creates a GitHub issue with a link to Grafana attached. You may have 
> noticed the open issues with the name "The <job_name> is flaky" in the daily 
> Beam High Priority Issue Report.
> For more information on how the tool works: 
> https://docs.google.com/document/d/13lwRAWoE7XA2ig0TDt98pI_nVBEQQ2UYqeUPJ0rGnME/edit?usp=sharing
> Beam Code Coverage Analysis
> There were some gaps in Python code coverage and no coverage analysis for 
> Java. As a result, we fixed configuration issues for the Jacoco plugin to 
> generate .xml files, which are used to display statistics in the Codecov 
> report, and adjusted the configuration for Python.
> For more details: 
> https://docs.google.com/document/d/1186dvd1t774EydPW0T31ynmwYxjqXUo9bh9nCO17rO0/edit?usp=sharing
> Beam Playground
> We've added a Playground CI Nightly check to make sure that Playground 
> examples remain functioning between SDK changes, etc. This will help ensure 
> that the examples are always up-to-date and that users can successfully use 
> them.
>
>
> Taking this opportunity, I would like to thank our team for these changes:
>
> Vlado Djerek (vlado.dje...@akvelon.com)
> Vitaly Terentyev (vitaly.terent...@akvelon.com)
> Akarys Shorabek (akarys.shora...@akvelon.com)
> Oleg Borisevich (oleg.borisev...@akvelon.com)
> Daria Bezkorovaina (daria.bezkorova...@akvelon.com)
> Danny McCormick (dannymccorm...@google.com)
> Yi Hu (ya...@google.com)
> XQ Hu (x...@google.com)
>
>
>
> Feel free to reach out to any of us if you have any questions.
>
>
> Thanks,
> Andrey

Re: Beam Infrastructure - Highlights of Recent Changes

Reply via email to