Great job, thanks for your work!
On 4/11/25 16:01, XQ Hu via dev wrote:
Great work, Vitaly and your team! Thanks a lot!
On Fri, Apr 11, 2025 at 9:48 AM Vitaly Terentyev via dev
<dev@beam.apache.org> wrote:
Dear Community,
March was a dynamic month for Beam Infrastructure & Health. We
began and ended the month with a solid health level of 98.38%, but
encountered two temporary dips due to a combination of emerging
issues and system-level changes.
Health Trends and Incident Analysis:
*
The first drop was linked to scattered failures across
multiple areas of the codebase, including Python, Java, Go,
and both Flink and Spark runners. These issues were quickly
triaged and mitigated.
*
The second drop occurred due to a method signature change that
required an update to the Dataflow Java container version,
alongside a group of failing XVR workflows caused by an
integer overflow during varint32 encoding. These were promptly
resolved.
Thanks to rapid resolution efforts, the system health recovered to
98.38%by the end of March. Please see the attached chart for
March's Health Status trends.
Key Improvements:
*
Flaky Test Fixes:
o
PostCommit and PreCommit jobs across Java, Python, SQL,
and Go.
o
XVR workflows and other runner related jobs.
o
You can find the full list of closed or fixed 21 issues
here
<https://github.com/apache/beam/issues?q=is%3Aissue%20state%3Aclosed%20label%3Aflaky_test%20closed%3A%3E2025-03-01%20%20closed%3A%3C2025-03-31%20(involves%3AAmar3tto%20OR%20involves%3Aakashorabek)%20>.
*
Performance Metrics Update
o
Added Performance Metrics for Python ML pipelines.
o
Updated Performance Metrics graphs on the Beam website
<https://beam.apache.org/performance/>using
Looker-generated images up to Beam 2.64.0.
Currently failing workflows
*
Core Infrastructure (1)
o
Publish Beam SDK Snapshots
<https://github.com/apache/beam/issues/32161>
*
Important Signals (2)
o
PostCommit Python Arm
<https://github.com/apache/beam/issues/30760>
o
PostCommit Python
<https://github.com/apache/beam/issues/30513>
*
Dataflow Java Tests (1)
o
PostCommit XVR GoUsingJava Dataflow
<https://github.com/apache/beam/issues/30519>
*
Python Runners Tests (1)
o
Python ValidatesContainer Dataflow ARM
<https://github.com/apache/beam/issues/33065>
*
Misc Tests (2)
o
IcebergIO Integration Tests
<https://github.com/apache/beam/issues/31931>
o
PostCommit XVR Flink
<https://github.com/apache/beam/issues/31418>
Ongoing and Future Work
*
Continue stabilizing newly emerging issues, with particular
attention to Python-related workflows.
*
Investigate and fix instability in IcebergIO Integration Tests
workflow.
*
Maintain high visibility of flaky and infra issues via our
Health Dashboard.
As always, if you notice infrastructure-related issues, feel free
to open a GitHub issue with the label “infra
<https://github.com/apache/beam/issues?q=is%3Aissue%20state%3Aopen%20label%3Ainfra>”,
and our team will triage and handle it.
Your engagement makes a big difference — and is always welcome.
Best regards,
Vitaly Terentyev
Akvelon Inc.
Apache Beam Infrastructure Team