Hi David et al,

I agree with all the suggestions, but from what I've seen, the flaky tests
tend to get ignored, and I'm afraid that disabling them would leave them
getting forgotten.  If the Jira ticket is accurate, we've got plenty of
tickets opened for > 2 years
<https://issues.apache.org/jira/issues/?jql=project%20%3D%20KAFKA%20AND%20status%20%3D%20Open%20AND%20labels%20%3D%20flaky-test%20ORDER%20BY%20created%20ASC>.
I do think Divij's call is a good initiative but keep in mind that tackling
these flaky tests can take significant time and effort - outside of one's
full time job.  I think the very least one can do is to ensure there is no
red build, for the near term - as I have seen quite a few PR getting merged
with broken build and broke the trunk.

P

On Mon, Nov 13, 2023 at 7:41 AM Divij Vaidya <divijvaidy...@gmail.com>
wrote:

> >  Please, do it.
> We can use specific labels to effectively filter those tickets.
>
> We already have a label and a way to discover flaky tests. They are tagged
> with the label "flaky-test" [1]. There is also a label "newbie" [2] meant
> for folks who are new to Apache Kafka code base.
> My suggestion is to send a broader email to the community (since many will
> miss details in this thread) and call for action for committers to
> volunteer as "shepherds" for these tickets. I can send one out once we have
> some consensus wrt next steps in this thread.
>
>
> [1]
>
> https://issues.apache.org/jira/browse/KAFKA-13421?jql=project%20%3D%20KAFKA%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flaky-test%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
>
>
> [2] https://kafka.apache.org/contributing -> Finding a project to work on
>
>
> Divij Vaidya
>
>
>
> On Mon, Nov 13, 2023 at 4:24 PM Николай Ижиков <nizhi...@apache.org>
> wrote:
>
> >
> > > To kickstart this effort, we can publish a list of such tickets in the
> > community and assign one or more committers the role of a «shepherd" for
> > each ticket.
> >
> > Please, do it.
> > We can use specific label to effectively filter those tickets.
> >
> > > 13 нояб. 2023 г., в 15:16, Divij Vaidya <divijvaidy...@gmail.com>
> > написал(а):
> > >
> > > Thanks for bringing this up David.
> > >
> > > My primary concern revolves around the possibility that the currently
> > > disabled tests may remain inactive indefinitely. We currently have
> > > unresolved JIRA tickets for flaky tests that have been pending for an
> > > extended period. I am inclined to support the idea of disabling these
> > tests
> > > temporarily and merging changes only when the build is successful,
> > provided
> > > there is a clear plan for re-enabling them in the future.
> > >
> > > To address this issue, I propose the following measures:
> > >
> > > 1\ Foster a supportive environment for new contributors within the
> > > community, encouraging them to take on tickets associated with flaky
> > tests.
> > > This initiative would require individuals familiar with the relevant
> code
> > > to offer guidance to those undertaking these tasks. Committers should
> > > prioritize reviewing and addressing these tickets within their
> available
> > > bandwidth. To kickstart this effort, we can publish a list of such
> > tickets
> > > in the community and assign one or more committers the role of a
> > "shepherd"
> > > for each ticket.
> > >
> > > 2\ Implement a policy to block minor version releases until the Release
> > > Manager (RM) is satisfied that the disabled tests do not result in gaps
> > in
> > > our testing coverage. The RM may rely on Subject Matter Experts (SMEs)
> in
> > > the specific code areas to provide assurance before giving the green
> > light
> > > for a release.
> > >
> > > 3\ Set a community-wide goal for 2024 to achieve a stable Continuous
> > > Integration (CI) system. This goal should encompass projects such as
> > > refining our test suite to eliminate flakiness and addressing
> > > infrastructure issues if necessary. By publishing this goal, we create
> a
> > > shared vision for the community in 2024, fostering alignment on our
> > > objectives. This alignment will aid in prioritizing tasks for community
> > > members and guide reviewers in allocating their bandwidth effectively.
> > >
> > > --
> > > Divij Vaidya
> > >
> > >
> > >
> > > On Sun, Nov 12, 2023 at 2:58 AM Justine Olshan
> > <jols...@confluent.io.invalid>
> > > wrote:
> > >
> > >> I will say that I have also seen tests that seem to be more flaky
> > >> intermittently. It may be ok for some time and suddenly the CI is
> > >> overloaded and we see issues.
> > >> I have also seen the CI struggling with running out of space recently,
> > so I
> > >> wonder if we can also try to improve things on that front.
> > >>
> > >> FWIW, I noticed, filed, or commented on several flaky test JIRAs last
> > week.
> > >> I'm happy to try to get to green builds, but everyone needs to be on
> > board.
> > >>
> > >> https://issues.apache.org/jira/browse/KAFKA-15529
> > >> https://issues.apache.org/jira/browse/KAFKA-14806
> > >> https://issues.apache.org/jira/browse/KAFKA-14249
> > >> https://issues.apache.org/jira/browse/KAFKA-15798
> > >> https://issues.apache.org/jira/browse/KAFKA-15797
> > >> https://issues.apache.org/jira/browse/KAFKA-15690
> > >> https://issues.apache.org/jira/browse/KAFKA-15699
> > >> https://issues.apache.org/jira/browse/KAFKA-15772
> > >> https://issues.apache.org/jira/browse/KAFKA-15759
> > >> https://issues.apache.org/jira/browse/KAFKA-15760
> > >> https://issues.apache.org/jira/browse/KAFKA-15700
> > >>
> > >> I've also seen that kraft transactions tests often flakily see that
> the
> > >> producer id is not allocated and times out.
> > >> I can file a JIRA for that too.
> > >>
> > >> Hopefully this is a place we can start from.
> > >>
> > >> Justine
> > >>
> > >> On Sat, Nov 11, 2023 at 11:35 AM Ismael Juma <m...@ismaeljuma.com>
> wrote:
> > >>
> > >>> On Sat, Nov 11, 2023 at 10:32 AM John Roesler <vvcep...@apache.org>
> > >> wrote:
> > >>>
> > >>>> In other words, I’m biased to think that new flakiness indicates
> > >>>> non-deterministic bugs more often than it indicates a bad test.
> > >>>>
> > >>>
> > >>> My experience is exactly the opposite. As someone who has tracked
> many
> > of
> > >>> the flaky fixes, the vast majority of the time they are an issue with
> > the
> > >>> test.
> > >>>
> > >>> Ismael
> > >>>
> > >>
> >
> >
>

Reply via email to