> Please, do it. We can use specific labels to effectively filter those tickets.
We already have a label and a way to discover flaky tests. They are tagged with the label "flaky-test" [1]. There is also a label "newbie" [2] meant for folks who are new to Apache Kafka code base. My suggestion is to send a broader email to the community (since many will miss details in this thread) and call for action for committers to volunteer as "shepherds" for these tickets. I can send one out once we have some consensus wrt next steps in this thread. [1] https://issues.apache.org/jira/browse/KAFKA-13421?jql=project%20%3D%20KAFKA%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flaky-test%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC [2] https://kafka.apache.org/contributing -> Finding a project to work on Divij Vaidya On Mon, Nov 13, 2023 at 4:24 PM Николай Ижиков <nizhi...@apache.org> wrote: > > > To kickstart this effort, we can publish a list of such tickets in the > community and assign one or more committers the role of a «shepherd" for > each ticket. > > Please, do it. > We can use specific label to effectively filter those tickets. > > > 13 нояб. 2023 г., в 15:16, Divij Vaidya <divijvaidy...@gmail.com> > написал(а): > > > > Thanks for bringing this up David. > > > > My primary concern revolves around the possibility that the currently > > disabled tests may remain inactive indefinitely. We currently have > > unresolved JIRA tickets for flaky tests that have been pending for an > > extended period. I am inclined to support the idea of disabling these > tests > > temporarily and merging changes only when the build is successful, > provided > > there is a clear plan for re-enabling them in the future. > > > > To address this issue, I propose the following measures: > > > > 1\ Foster a supportive environment for new contributors within the > > community, encouraging them to take on tickets associated with flaky > tests. > > This initiative would require individuals familiar with the relevant code > > to offer guidance to those undertaking these tasks. Committers should > > prioritize reviewing and addressing these tickets within their available > > bandwidth. To kickstart this effort, we can publish a list of such > tickets > > in the community and assign one or more committers the role of a > "shepherd" > > for each ticket. > > > > 2\ Implement a policy to block minor version releases until the Release > > Manager (RM) is satisfied that the disabled tests do not result in gaps > in > > our testing coverage. The RM may rely on Subject Matter Experts (SMEs) in > > the specific code areas to provide assurance before giving the green > light > > for a release. > > > > 3\ Set a community-wide goal for 2024 to achieve a stable Continuous > > Integration (CI) system. This goal should encompass projects such as > > refining our test suite to eliminate flakiness and addressing > > infrastructure issues if necessary. By publishing this goal, we create a > > shared vision for the community in 2024, fostering alignment on our > > objectives. This alignment will aid in prioritizing tasks for community > > members and guide reviewers in allocating their bandwidth effectively. > > > > -- > > Divij Vaidya > > > > > > > > On Sun, Nov 12, 2023 at 2:58 AM Justine Olshan > <jols...@confluent.io.invalid> > > wrote: > > > >> I will say that I have also seen tests that seem to be more flaky > >> intermittently. It may be ok for some time and suddenly the CI is > >> overloaded and we see issues. > >> I have also seen the CI struggling with running out of space recently, > so I > >> wonder if we can also try to improve things on that front. > >> > >> FWIW, I noticed, filed, or commented on several flaky test JIRAs last > week. > >> I'm happy to try to get to green builds, but everyone needs to be on > board. > >> > >> https://issues.apache.org/jira/browse/KAFKA-15529 > >> https://issues.apache.org/jira/browse/KAFKA-14806 > >> https://issues.apache.org/jira/browse/KAFKA-14249 > >> https://issues.apache.org/jira/browse/KAFKA-15798 > >> https://issues.apache.org/jira/browse/KAFKA-15797 > >> https://issues.apache.org/jira/browse/KAFKA-15690 > >> https://issues.apache.org/jira/browse/KAFKA-15699 > >> https://issues.apache.org/jira/browse/KAFKA-15772 > >> https://issues.apache.org/jira/browse/KAFKA-15759 > >> https://issues.apache.org/jira/browse/KAFKA-15760 > >> https://issues.apache.org/jira/browse/KAFKA-15700 > >> > >> I've also seen that kraft transactions tests often flakily see that the > >> producer id is not allocated and times out. > >> I can file a JIRA for that too. > >> > >> Hopefully this is a place we can start from. > >> > >> Justine > >> > >> On Sat, Nov 11, 2023 at 11:35 AM Ismael Juma <m...@ismaeljuma.com> wrote: > >> > >>> On Sat, Nov 11, 2023 at 10:32 AM John Roesler <vvcep...@apache.org> > >> wrote: > >>> > >>>> In other words, I’m biased to think that new flakiness indicates > >>>> non-deterministic bugs more often than it indicates a bad test. > >>>> > >>> > >>> My experience is exactly the opposite. As someone who has tracked many > of > >>> the flaky fixes, the vast majority of the time they are an issue with > the > >>> test. > >>> > >>> Ismael > >>> > >> > >