Re: [DISCUSS] Should we continue to merge without a green build? No!

Chris Egerton Fri, 24 Nov 2023 20:25:48 -0800

Hi all,

There's a lot to catch up on here but I wanted to clarify something.
Regarding this comment from Sophie:



> Yet multiple people in this thread so
far have voiced support for "gating merges on the successful completion of
all parts of the build except tests". Just to be totally clear, I really
don't think that was ever in question -- though it certainly doesn't hurt
to remind everyone.

> So, this thread is not about whether or not to merge with failing
*builds, *but it's
whether it should be acceptable to merge with failing *tests.*


I think there's a misunderstanding here. I was suggesting
programmatic gating, not manual. If we could disable these types of changes
from being merged, instead of relying on committers to check and interpret
Jenkins results, that'd be a quick win IMO. And, because of the
already-discussed issues with flaky tests, it seemed difficult to disable
PRs from being merged with failing tests--just for other parts of the build.

However, I think the retry logic brought up by David could be sufficient to
skip that kind of intermediate step and allow us to just start
programmatically disabling PR merges if the build (including) tests fails.
But if anyone's interested, we can still prevent failing tests from failing
the build with the ignoreFailures property [1].

[1] -
https://docs.gradle.org/current/dsl/org.gradle.api.tasks.testing.Test.html#org.gradle.api.tasks.testing.Test:ignoreFailures

Cheers,

Chris

On Wed, Nov 22, 2023 at 3:00 AM Ismael Juma <m...@ismaeljuma.com> wrote:

> I think it breaks the Jenkins output otherwise. Feel free to test it via a
> PR.
>
> Ismael
>
> On Wed, Nov 22, 2023, 12:42 AM David Jacot <dja...@confluent.io.invalid>
> wrote:
>
> > Hi Ismael,
> >
> > No, I was not aware of KAFKA-12216. My understanding is that we could
> still
> > do it without the JUnitFlakyTestDataPublisher plugin and we could use
> > gradle enterprise for this. Or do you think that reporting the flaky
> tests
> > in the build results is required?
> >
> > David
> >
> > On Wed, Nov 22, 2023 at 9:35 AM Ismael Juma <m...@ismaeljuma.com> wrote:
> >
> > > Hi David,
> > >
> > > Did you take a look at
> https://issues.apache.org/jira/browse/KAFKA-12216
> > ?
> > > I
> > > looked into this option already (yes, there isn't much that we haven't
> > > considered in this space).
> > >
> > > Ismael
> > >
> > > On Wed, Nov 22, 2023 at 12:24 AM David Jacot
> <dja...@confluent.io.invalid
> > >
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > Thanks for the good discussion and all the comments. Overall, it
> seems
> > > that
> > > > we all agree on the bad state of our CI. That's a good first step!
> > > >
> > > > I have talked to a few folks this week about it and it seems that
> many
> > > > folks (including me) are not comfortable with merging PRs at the
> moment
> > > > because the results of our builds are so bad. I had 40+ failed tests
> in
> > > one
> > > > of my PRs, all unrelated to my changes. It is really hard to be
> > > productive
> > > > with this.
> > > >
> > > > Personally, I really want to move towards requiring a green build to
> > > merge
> > > > to trunk because this is a clear and binary signal. I agree that we
> > need
> > > to
> > > > stabilize the builds before we could even require this so here is my
> > > > proposal.
> > > >
> > > > 1) We could leverage the `reports.junitXml.mergeReruns` option in
> > gradle.
> > > > From the doc [1]:
> > > >
> > > > > When mergeReruns is enabled, if a test fails but is then retried
> and
> > > > succeeds, its failures will be recorded as <flakyFailure> instead of
> > > > <failure>, within one <testcase>. This is effectively the reporting
> > > > produced by the surefire plugin of Apache Maven™ when enabling
> reruns.
> > If
> > > > your CI server understands this format, it will indicate that the
> test
> > > was
> > > > flaky. If it > does not, it will indicate that the test succeeded as
> it
> > > > will ignore the <flakyFailure> information. If the test does not
> > succeed
> > > > (i.e. it fails for every retry), it will be indicated as having
> failed
> > > > whether your tool understands this format or not.
> > > > > When mergeReruns is disabled (the default), each execution of a
> test
> > > will
> > > > be listed as a separate test case.
> > > >
> > > > It would not resolve all the faky tests for sure but it would at
> least
> > > > reduce the noise. I see this as a means to get to green builds
> faster.
> > I
> > > > played a bit with this setting and I discovered [2]. I was hoping
> that
> > > [3]
> > > > could help to resolve it but I need to confirm.
> > > >
> > > > 2) I suppose that we would still have flaky tests preventing us from
> > > > getting a green build even with the setting in place. For those, I
> > think
> > > > that we need to review them one by one and decide whether we want to
> > fix
> > > or
> > > > disable them. This is a short term effort to help us get to green
> > builds.
> > > >
> > > > 3) When we get to a point where we can get green builds consistently,
> > we
> > > > could enforce it.
> > > >
> > > > 4) Flaky tests won't disappear with this. They are just hidden.
> > > Therefore,
> > > > we also need a process to review the flaky tests and address them.
> > Here,
> > > I
> > > > think that we could leverage the dashboard shared by Ismael. One
> > > > possibility would be to review it regularly and decide for each test
> > > > whether it should be fixed, disabled or even removed.
> > > >
> > > > Please let me know what you think.
> > > >
> > > > Another angle that we could consider is improving the CI
> infrastructure
> > > as
> > > > well. I think that many of those flaky tests are due to overloaded
> > > Jenkins
> > > > workers. We should perhaps discuss with the infra team to see whether
> > we
> > > > could do something there.
> > > >
> > > > Best,
> > > > David
> > > >
> > > > [1]
> > > >
> > https://docs.gradle.org/current/userguide/java_testing.html#mergereruns
> > > > [2] https://github.com/gradle/gradle/issues/23324
> > > > [3] https://github.com/apache/kafka/pull/14687
> > > >
> > > >
> > > > On Wed, Nov 22, 2023 at 4:10 AM Ismael Juma <m...@ismaeljuma.com>
> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > We have a dashboard already:
> > > > >
> > > > > [image: image.png]
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://ge.apache.org/scans/tests?search.names=Git%20branch&search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.timeZoneId=America%2FLos_Angeles&search.values=trunk&tests.sortField=FLAKY
> > > > >
> > > > > On Tue, Nov 14, 2023 at 10:41 PM Николай Ижиков <
> nizhi...@apache.org
> > >
> > > > > wrote:
> > > > >
> > > > >> Hello guys.
> > > > >>
> > > > >> I want to tell you about one more approach to deal with flaky
> tests.
> > > > >> We adopt this approach in Apache Ignite community, so may be it
> can
> > be
> > > > >> helpful for Kafka, also.
> > > > >>
> > > > >> TL;DR: Apache Ignite community have a tool that provide a
> statistic
> > of
> > > > >> tests and can tell if PR introduces new failures.
> > > > >>
> > > > >> Apache Ignite has a many tests.
> > > > >> Latest «Run All» contains around 75k.
> > > > >> Most of test has integration style therefore count of flacky are
> > > > >> significant.
> > > > >>
> > > > >> We build a tool - Team City Bot [1]
> > > > >> That provides a combined statistic of flaky tests [2]
> > > > >>
> > > > >> This tool can compare results of Run All for PR and master.
> > > > >> If all OK one can comment jira ticket with a visa from bot [3]
> > > > >>
> > > > >> Visa is a quality proof of PR for Ignite committers.
> > > > >> And we can sort out most flaky tests and prioritize fixes with the
> > bot
> > > > >> statistic [2]
> > > > >>
> > > > >> TC bot integrated with the Team City only, for now.
> > > > >> But, if Kafka community interested we can try to integrate it with
> > > > >> Jenkins.
> > > > >>
> > > > >> [1] https://github.com/apache/ignite-teamcity-bot
> > > > >> [2]
> > > > https://tcbot2.sbt-ignite-dev.ru/current.html?branch=master&count=10
> > > > >> [3]
> > > > >>
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/IGNITE-19950?focusedCommentId=17767394&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17767394
> > > > >>
> > > > >>
> > > > >>
> > > > >> > 15 нояб. 2023 г., в 09:18, Ismael Juma <m...@ismaeljuma.com>
> > > > написал(а):
> > > > >> >
> > > > >> > To use the pain analogy, people seem to have really good
> > painkillers
> > > > and
> > > > >> > hence they somehow don't feel the pain already. ;)
> > > > >> >
> > > > >> > The reality is that important and high quality tests will get
> > fixed.
> > > > >> Poor
> > > > >> > quality tests (low signal to noise ratio) might not get fixed
> and
> > > > >> that's ok.
> > > > >> >
> > > > >> > I'm not opposed to marking the tests as release blockers as a
> > > starting
> > > > >> > point, but I'm saying it's fine if people triage them and decide
> > > they
> > > > >> are
> > > > >> > not blockers. In fact, that has already happened in the past.
> > > > >> >
> > > > >> > Ismael
> > > > >> >
> > > > >> > On Tue, Nov 14, 2023 at 10:02 PM Matthias J. Sax <
> > mj...@apache.org>
> > > > >> wrote:
> > > > >> >
> > > > >> >> I agree on the test gap argument. However, my worry is, if we
> > don't
> > > > >> >> "force the pain", it won't get fixed at all. -- I also know,
> that
> > > we
> > > > >> try
> > > > >> >> to find an working approach for many years...
> > > > >> >>
> > > > >> >> My take is that if we disable a test and file a non-blocking
> > Jira,
> > > > it's
> > > > >> >> basically the same as just deleting the test all together and
> > never
> > > > >> talk
> > > > >> >> about it again. -- I believe, this is not want we aim for, but
> we
> > > aim
> > > > >> >> for good test coverage and a way to get these test fixed?
> > > > >> >>
> > > > >> >> Thus IMHO we need some forcing function (either keep the tests
> > and
> > > > feel
> > > > >> >> the pain on every PR), or disable the test and file a blocker
> > JIRA
> > > so
> > > > >> >> the pain surfaces on a release forcing us to do something about
> > it.
> > > > >> >>
> > > > >> >> If there is no forcing function, it basically means we are
> > willing
> > > to
> > > > >> >> accept test gaps forever.
> > > > >> >>
> > > > >> >>
> > > > >> >> -Matthias
> > > > >> >>
> > > > >> >> On 11/14/23 9:09 PM, Ismael Juma wrote:
> > > > >> >>> Matthias,
> > > > >> >>>
> > > > >> >>> Flaky tests are worse than useless. I know engineers find it
> > hard
> > > to
> > > > >> >>> disable them because of the supposed test gap (I find it hard
> > > too),
> > > > >> but
> > > > >> >> the
> > > > >> >>> truth is that the test gap is already there! No-one blocks
> > merging
> > > > >> PRs on
> > > > >> >>> flaky tests, but they do get used to ignoring build failures.
> > > > >> >>>
> > > > >> >>> The current approach has been attempted for nearly a decade
> and
> > it
> > > > has
> > > > >> >>> never worked. I think we should try something different.
> > > > >> >>>
> > > > >> >>> When it comes to marking flaky tests as release blockers, I
> > don't
> > > > >> think
> > > > >> >>> this should be done as a general rule. We should instead
> assess
> > > on a
> > > > >> case
> > > > >> >>> by case basis, same way we do it for bugs.
> > > > >> >>>
> > > > >> >>> Ismael
> > > > >> >>>
> > > > >> >>> On Tue, Nov 14, 2023 at 5:02 PM Matthias J. Sax <
> > mj...@apache.org
> > > >
> > > > >> >> wrote:
> > > > >> >>>
> > > > >> >>>> Thanks for starting this discussion David! I totally agree to
> > > "no"!
> > > > >> >>>>
> > > > >> >>>> I think there is no excuse whatsoever for merging PRs with
> > > > >> compilation
> > > > >> >>>> errors (except an honest mistake for conflicting PRs that got
> > > > merged
> > > > >> >>>> interleaved). -- Every committer must(!) check the Jenkins
> > status
> > > > >> before
> > > > >> >>>> merging to avoid such an issue.
> > > > >> >>>>
> > > > >> >>>> Similar for actual permanently broken tests. If there is no
> > green
> > > > >> build,
> > > > >> >>>> and the same test failed across multiple Jenkins runs, a
> > > committer
> > > > >> >>>> should detect this and cannot merge a PR.
> > > > >> >>>>
> > > > >> >>>> Given the current state of the CI pipeline, it seems possible
> > to
> > > > get
> > > > >> >>>> green runs, and thus I support the policy (that we actually
> > > always
> > > > >> had)
> > > > >> >>>> to only merge if there is at least one green build. If
> > committers
> > > > got
> > > > >> >>>> sloppy about this, we need to call it out and put a hold on
> > this
> > > > >> >> practice.
> > > > >> >>>>
> > > > >> >>>> (The only exception from the above policy would be a very
> > > unstable
> > > > >> >>>> status for which getting a green build is not possible at
> all,
> > > due
> > > > to
> > > > >> >>>> too many flaky tests -- for this case, I would accept to
> merge
> > > even
> > > > >> >>>> there is no green build, but committer need to manually
> ensure
> > > that
> > > > >> >>>> every test did pass in at least one test run. -- We had this
> in
> > > the
> > > > >> >>>> past, but we I don't think we are in such a bad situation
> right
> > > > now).
> > > > >> >>>>
> > > > >> >>>> About disabling tests: I was never a fan of this, because in
> my
> > > > >> >>>> experience those tests are not fixed any time soon.
> Especially,
> > > > >> because
> > > > >> >>>> we do not consider such tickets as release blockers. To me,
> > > seeing
> > > > >> tests
> > > > >> >>>> fails on PR build is actually a good forcing function for
> > people
> > > to
> > > > >> feel
> > > > >> >>>> the pain, and thus get motivated to make time to fix the
> tests.
> > > > >> >>>>
> > > > >> >>>> I have to admit that I was a little bit sloppy paying
> attention
> > > to
> > > > >> flaky
> > > > >> >>>> tests recently, and I highly appreciate this effort. Also
> > thanks
> > > to
> > > > >> >>>> everyone how actually filed a ticket! IMHO, we should file a
> > > ticket
> > > > >> for
> > > > >> >>>> every flaky test, and also keep adding comments each time we
> > see
> > > a
> > > > >> test
> > > > >> >>>> fail to be able to track the frequency at which a tests
> fails,
> > so
> > > > we
> > > > >> can
> > > > >> >>>> fix the most pressing ones first.
> > > > >> >>>>
> > > > >> >>>> Te me, the best forcing function to get test stabilized is to
> > > file
> > > > >> >>>> tickets and consider them release blockers. Disabling tests
> > does
> > > > not
> > > > >> >>>> really help much IMHO to tackle the problem (we can of course
> > > still
> > > > >> >>>> disable them to get noise out of the system, but it would
> only
> > > > >> introduce
> > > > >> >>>> testing gaps for the time being and also does not help to
> > figure
> > > > out
> > > > >> how
> > > > >> >>>> often a test fails, so it's not a solution to the problem
> IMHO)
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>> -Matthias
> > > > >> >>>>
> > > > >> >>>> On 11/13/23 11:40 PM, Sagar wrote:
> > > > >> >>>>> Hi Divij,
> > > > >> >>>>>
> > > > >> >>>>> I think this proposal overall makes sense. My only nit sort
> > of a
> > > > >> >>>> suggestion
> > > > >> >>>>> is that let's also consider a label called newbie++[1] for
> > flaky
> > > > >> tests
> > > > >> >> if
> > > > >> >>>>> we are considering adding newbie as a label. I think some of
> > the
> > > > >> flaky
> > > > >> >>>>> tests need familiarity with the codebase or the test setup
> so
> > > as a
> > > > >> >> first
> > > > >> >>>>> time contributor, it might be difficult. newbie++ IMO covers
> > > that
> > > > >> >> aspect.
> > > > >> >>>>>
> > > > >> >>>>> [1]
> > > > >> >>>>>
> > > > >> >>>>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/KAFKA-15406?jql=project%20%3D%20KAFKA%20AND%20labels%20%3D%20%22newbie%2B%2B%22
> > > > >> >>>>>
> > > > >> >>>>> Let me know what you think.
> > > > >> >>>>>
> > > > >> >>>>> Thanks!
> > > > >> >>>>> Sagar.
> > > > >> >>>>>
> > > > >> >>>>> On Mon, Nov 13, 2023 at 9:11 PM Divij Vaidya <
> > > > >> divijvaidy...@gmail.com>
> > > > >> >>>>> wrote:
> > > > >> >>>>>
> > > > >> >>>>>>>   Please, do it.
> > > > >> >>>>>> We can use specific labels to effectively filter those
> > tickets.
> > > > >> >>>>>>
> > > > >> >>>>>> We already have a label and a way to discover flaky tests.
> > They
> > > > are
> > > > >> >>>> tagged
> > > > >> >>>>>> with the label "flaky-test" [1]. There is also a label
> > "newbie"
> > > > [2]
> > > > >> >>>> meant
> > > > >> >>>>>> for folks who are new to Apache Kafka code base.
> > > > >> >>>>>> My suggestion is to send a broader email to the community
> > > (since
> > > > >> many
> > > > >> >>>> will
> > > > >> >>>>>> miss details in this thread) and call for action for
> > committers
> > > > to
> > > > >> >>>>>> volunteer as "shepherds" for these tickets. I can send one
> > out
> > > > >> once we
> > > > >> >>>> have
> > > > >> >>>>>> some consensus wrt next steps in this thread.
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>> [1]
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/KAFKA-13421?jql=project%20%3D%20KAFKA%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flaky-test%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>> [2] https://kafka.apache.org/contributing -> Finding a
> > project
> > > > to
> > > > >> >> work
> > > > >> >>>> on
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>> Divij Vaidya
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>> On Mon, Nov 13, 2023 at 4:24 PM Николай Ижиков <
> > > > >> nizhi...@apache.org>
> > > > >> >>>>>> wrote:
> > > > >> >>>>>>
> > > > >> >>>>>>>
> > > > >> >>>>>>>> To kickstart this effort, we can publish a list of such
> > > tickets
> > > > >> in
> > > > >> >> the
> > > > >> >>>>>>> community and assign one or more committers the role of a
> > > > >> «shepherd"
> > > > >> >>>> for
> > > > >> >>>>>>> each ticket.
> > > > >> >>>>>>>
> > > > >> >>>>>>> Please, do it.
> > > > >> >>>>>>> We can use specific label to effectively filter those
> > tickets.
> > > > >> >>>>>>>
> > > > >> >>>>>>>> 13 нояб. 2023 г., в 15:16, Divij Vaidya <
> > > > divijvaidy...@gmail.com
> > > > >> >
> > > > >> >>>>>>> написал(а):
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> Thanks for bringing this up David.
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> My primary concern revolves around the possibility that
> the
> > > > >> >> currently
> > > > >> >>>>>>>> disabled tests may remain inactive indefinitely. We
> > currently
> > > > >> have
> > > > >> >>>>>>>> unresolved JIRA tickets for flaky tests that have been
> > > pending
> > > > >> for
> > > > >> >> an
> > > > >> >>>>>>>> extended period. I am inclined to support the idea of
> > > disabling
> > > > >> >> these
> > > > >> >>>>>>> tests
> > > > >> >>>>>>>> temporarily and merging changes only when the build is
> > > > >> successful,
> > > > >> >>>>>>> provided
> > > > >> >>>>>>>> there is a clear plan for re-enabling them in the future.
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> To address this issue, I propose the following measures:
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> 1\ Foster a supportive environment for new contributors
> > > within
> > > > >> the
> > > > >> >>>>>>>> community, encouraging them to take on tickets associated
> > > with
> > > > >> flaky
> > > > >> >>>>>>> tests.
> > > > >> >>>>>>>> This initiative would require individuals familiar with
> the
> > > > >> relevant
> > > > >> >>>>>> code
> > > > >> >>>>>>>> to offer guidance to those undertaking these tasks.
> > > Committers
> > > > >> >> should
> > > > >> >>>>>>>> prioritize reviewing and addressing these tickets within
> > > their
> > > > >> >>>>>> available
> > > > >> >>>>>>>> bandwidth. To kickstart this effort, we can publish a
> list
> > of
> > > > >> such
> > > > >> >>>>>>> tickets
> > > > >> >>>>>>>> in the community and assign one or more committers the
> role
> > > of
> > > > a
> > > > >> >>>>>>> "shepherd"
> > > > >> >>>>>>>> for each ticket.
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> 2\ Implement a policy to block minor version releases
> until
> > > the
> > > > >> >>>> Release
> > > > >> >>>>>>>> Manager (RM) is satisfied that the disabled tests do not
> > > result
> > > > >> in
> > > > >> >>>> gaps
> > > > >> >>>>>>> in
> > > > >> >>>>>>>> our testing coverage. The RM may rely on Subject Matter
> > > Experts
> > > > >> >> (SMEs)
> > > > >> >>>>>> in
> > > > >> >>>>>>>> the specific code areas to provide assurance before
> giving
> > > the
> > > > >> green
> > > > >> >>>>>>> light
> > > > >> >>>>>>>> for a release.
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> 3\ Set a community-wide goal for 2024 to achieve a stable
> > > > >> Continuous
> > > > >> >>>>>>>> Integration (CI) system. This goal should encompass
> > projects
> > > > >> such as
> > > > >> >>>>>>>> refining our test suite to eliminate flakiness and
> > addressing
> > > > >> >>>>>>>> infrastructure issues if necessary. By publishing this
> > goal,
> > > we
> > > > >> >> create
> > > > >> >>>>>> a
> > > > >> >>>>>>>> shared vision for the community in 2024, fostering
> > alignment
> > > on
> > > > >> our
> > > > >> >>>>>>>> objectives. This alignment will aid in prioritizing tasks
> > for
> > > > >> >>>> community
> > > > >> >>>>>>>> members and guide reviewers in allocating their bandwidth
> > > > >> >> effectively.
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> --
> > > > >> >>>>>>>> Divij Vaidya
> > > > >> >>>>>>>>
> > > > >> >>>>>>>>
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> On Sun, Nov 12, 2023 at 2:58 AM Justine Olshan
> > > > >> >>>>>>> <jols...@confluent.io.invalid>
> > > > >> >>>>>>>> wrote:
> > > > >> >>>>>>>>
> > > > >> >>>>>>>>> I will say that I have also seen tests that seem to be
> > more
> > > > >> flaky
> > > > >> >>>>>>>>> intermittently. It may be ok for some time and suddenly
> > the
> > > CI
> > > > >> is
> > > > >> >>>>>>>>> overloaded and we see issues.
> > > > >> >>>>>>>>> I have also seen the CI struggling with running out of
> > space
> > > > >> >>>> recently,
> > > > >> >>>>>>> so I
> > > > >> >>>>>>>>> wonder if we can also try to improve things on that
> front.
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> FWIW, I noticed, filed, or commented on several flaky
> test
> > > > JIRAs
> > > > >> >> last
> > > > >> >>>>>>> week.
> > > > >> >>>>>>>>> I'm happy to try to get to green builds, but everyone
> > needs
> > > to
> > > > >> be
> > > > >> >> on
> > > > >> >>>>>>> board.
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15529
> > > > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-14806
> > > > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-14249
> > > > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15798
> > > > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15797
> > > > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15690
> > > > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15699
> > > > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15772
> > > > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15759
> > > > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15760
> > > > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15700
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> I've also seen that kraft transactions tests often
> flakily
> > > see
> > > > >> that
> > > > >> >>>>>> the
> > > > >> >>>>>>>>> producer id is not allocated and times out.
> > > > >> >>>>>>>>> I can file a JIRA for that too.
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> Hopefully this is a place we can start from.
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> Justine
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> On Sat, Nov 11, 2023 at 11:35 AM Ismael Juma <
> > > > m...@ismaeljuma.com
> > > > >> >
> > > > >> >>>>>> wrote:
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>>> On Sat, Nov 11, 2023 at 10:32 AM John Roesler <
> > > > >> >> vvcep...@apache.org>
> > > > >> >>>>>>>>> wrote:
> > > > >> >>>>>>>>>>
> > > > >> >>>>>>>>>>> In other words, I’m biased to think that new flakiness
> > > > >> indicates
> > > > >> >>>>>>>>>>> non-deterministic bugs more often than it indicates a
> > bad
> > > > >> test.
> > > > >> >>>>>>>>>>>
> > > > >> >>>>>>>>>>
> > > > >> >>>>>>>>>> My experience is exactly the opposite. As someone who
> has
> > > > >> tracked
> > > > >> >>>>>> many
> > > > >> >>>>>>> of
> > > > >> >>>>>>>>>> the flaky fixes, the vast majority of the time they are
> > an
> > > > >> issue
> > > > >> >>>> with
> > > > >> >>>>>>> the
> > > > >> >>>>>>>>>> test.
> > > > >> >>>>>>>>>>
> > > > >> >>>>>>>>>> Ismael
> > > > >> >>>>>>>>>>
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>
> > > > >> >>>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>
> > > > >> >>>>
> > > > >> >>>
> > > > >> >>
> > > > >>
> > > > >>
> > > >
> > >
> >
>

Re: [DISCUSS] Should we continue to merge without a green build? No!

Reply via email to