Hi everyone, We have increased amount of test jobs failures recently.
In terms of numbers (based on my memory and http://35.226.225.164/): Java precommits went down from ~55% to ~30% of succeeded jobs. Java postcommits went down from ~60 to ~40 of succeeded jobs. I'm currently triaging post-commit failures and wonder if it will be useful to send regular updates on found issues and implemented fixes? What can be present in update: * Tests greenness based on http://35.226.225.164/ (work on better dashboard is in progress) * List of Jira tickets with triaged failures with no owners * List of Jira tickets in progress and who's working on fixes * List of Jira tickets with fixes shipped Each point can also have short description of failure reason. I believe such update sent daily or bi-daily can increase visibility for known failures, simplify search for people who can fix tests, and add nice tracking status. What do you think? Regards, --Mikhail Have feedback <http://go/migryz-feedback>? On Fri, Aug 10, 2018 at 1:24 PM Mikhail Gryzykhin <mig...@google.com> wrote: > Hi everyone, > > I'm following up on tackling post-commit tests greenness. (See beam > post-commit policies > <https://beam.apache.org/contribute/postcommits-policies/>) > > During this week, I've assembled a list of most problematic flaky or > failing tests > <https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC>. > Unfortunately, I'm relatively new to the project and lack triaging guides, > so most of tickets contain only basic information. > > *I want to ask community help in following areas:* > 1. If you know how to triage tests or the location of triage guide, please > share the knowledge. You can post links here, or add pages to Confluence > wiki <https://cwiki.apache.org/confluence/display/BEAM/> and share link > here. > 2. Please, check on the Jira test-failures > <https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC>list > and pick up tests that you might know how to fix and help with fixing > those. Tickets that do not have owner now are not being worked on. I'm > trying out easy mitigations for some of the failures (ie increasing > timeouts), but those should not be treated as fixes. > > *Current status:* > Items that are marked critical in the failures list tend to fail jobs in > ~5-10% runs each. > > I contacted Anton Kedin directly and he works on fixes for couple of most > problematic flakes currently. Anton, thank you for picking those up. > > Please, update owner and status of ticket if you start working on some > test failure, this will save time for others who might also start looking > into the failure. > > Thank you, > --Mikhail > > Have feedback <http://go/migryz-feedback>? >