Hi everyone,

We have increased amount of test jobs failures recently.

In terms of numbers (based on my memory and http://35.226.225.164/):
Java precommits went down from ~55% to ~30% of succeeded jobs.
Java postcommits went down from ~60 to ~40 of succeeded jobs.

I'm currently triaging post-commit failures and wonder if it will be useful
to send regular updates on found issues and implemented fixes?

What can be present in update:
* Tests greenness based on http://35.226.225.164/ (work on better dashboard
is in progress)
* List of Jira tickets with triaged failures with no owners
* List of Jira tickets in progress and who's working on fixes
* List of Jira tickets with fixes shipped

Each point can also have short description of failure reason.

I believe such update sent daily or bi-daily can increase visibility for
known failures, simplify search for people who can fix tests, and add nice
tracking status.

What do you think?

Regards,
--Mikhail

Have feedback <http://go/migryz-feedback>?


On Fri, Aug 10, 2018 at 1:24 PM Mikhail Gryzykhin <mig...@google.com> wrote:

> Hi everyone,
>
> I'm following up on tackling post-commit tests greenness. (See beam
> post-commit policies
> <https://beam.apache.org/contribute/postcommits-policies/>)
>
> During this week, I've assembled a list of most problematic flaky or
> failing tests
> <https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC>.
> Unfortunately, I'm relatively new to the project and lack triaging guides,
> so most of tickets contain only basic information.
>
> *I want to ask community help in following areas:*
> 1. If you know how to triage tests or the location of triage guide, please
> share the knowledge. You can post links here, or add pages to Confluence
> wiki <https://cwiki.apache.org/confluence/display/BEAM/> and share link
> here.
> 2. Please, check on the Jira test-failures
> <https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC>list
> and pick up tests that you might know how to fix and help with fixing
> those. Tickets that do not have owner now are not being worked on. I'm
> trying out easy mitigations for some of the failures (ie increasing
> timeouts), but those should not be treated as fixes.
>
> *Current status:*
> Items that are marked critical in the failures list tend to fail jobs in
> ~5-10% runs each.
>
> I contacted Anton Kedin directly and he works on fixes for couple of most
> problematic flakes currently. Anton, thank you for picking those up.
>
> Please, update owner and status of ticket if you start working on some
> test failure, this will save time for others who might also start looking
> into the failure.
>
> Thank you,
> --Mikhail
>
> Have feedback <http://go/migryz-feedback>?
>

Reply via email to