I think it breaks the Jenkins output otherwise. Feel free to test it via a PR.
Ismael On Wed, Nov 22, 2023, 12:42 AM David Jacot <dja...@confluent.io.invalid> wrote: > Hi Ismael, > > No, I was not aware of KAFKA-12216. My understanding is that we could still > do it without the JUnitFlakyTestDataPublisher plugin and we could use > gradle enterprise for this. Or do you think that reporting the flaky tests > in the build results is required? > > David > > On Wed, Nov 22, 2023 at 9:35 AM Ismael Juma <m...@ismaeljuma.com> wrote: > > > Hi David, > > > > Did you take a look at https://issues.apache.org/jira/browse/KAFKA-12216 > ? > > I > > looked into this option already (yes, there isn't much that we haven't > > considered in this space). > > > > Ismael > > > > On Wed, Nov 22, 2023 at 12:24 AM David Jacot <dja...@confluent.io.invalid > > > > wrote: > > > > > Hi all, > > > > > > Thanks for the good discussion and all the comments. Overall, it seems > > that > > > we all agree on the bad state of our CI. That's a good first step! > > > > > > I have talked to a few folks this week about it and it seems that many > > > folks (including me) are not comfortable with merging PRs at the moment > > > because the results of our builds are so bad. I had 40+ failed tests in > > one > > > of my PRs, all unrelated to my changes. It is really hard to be > > productive > > > with this. > > > > > > Personally, I really want to move towards requiring a green build to > > merge > > > to trunk because this is a clear and binary signal. I agree that we > need > > to > > > stabilize the builds before we could even require this so here is my > > > proposal. > > > > > > 1) We could leverage the `reports.junitXml.mergeReruns` option in > gradle. > > > From the doc [1]: > > > > > > > When mergeReruns is enabled, if a test fails but is then retried and > > > succeeds, its failures will be recorded as <flakyFailure> instead of > > > <failure>, within one <testcase>. This is effectively the reporting > > > produced by the surefire plugin of Apache Maven™ when enabling reruns. > If > > > your CI server understands this format, it will indicate that the test > > was > > > flaky. If it > does not, it will indicate that the test succeeded as it > > > will ignore the <flakyFailure> information. If the test does not > succeed > > > (i.e. it fails for every retry), it will be indicated as having failed > > > whether your tool understands this format or not. > > > > When mergeReruns is disabled (the default), each execution of a test > > will > > > be listed as a separate test case. > > > > > > It would not resolve all the faky tests for sure but it would at least > > > reduce the noise. I see this as a means to get to green builds faster. > I > > > played a bit with this setting and I discovered [2]. I was hoping that > > [3] > > > could help to resolve it but I need to confirm. > > > > > > 2) I suppose that we would still have flaky tests preventing us from > > > getting a green build even with the setting in place. For those, I > think > > > that we need to review them one by one and decide whether we want to > fix > > or > > > disable them. This is a short term effort to help us get to green > builds. > > > > > > 3) When we get to a point where we can get green builds consistently, > we > > > could enforce it. > > > > > > 4) Flaky tests won't disappear with this. They are just hidden. > > Therefore, > > > we also need a process to review the flaky tests and address them. > Here, > > I > > > think that we could leverage the dashboard shared by Ismael. One > > > possibility would be to review it regularly and decide for each test > > > whether it should be fixed, disabled or even removed. > > > > > > Please let me know what you think. > > > > > > Another angle that we could consider is improving the CI infrastructure > > as > > > well. I think that many of those flaky tests are due to overloaded > > Jenkins > > > workers. We should perhaps discuss with the infra team to see whether > we > > > could do something there. > > > > > > Best, > > > David > > > > > > [1] > > > > https://docs.gradle.org/current/userguide/java_testing.html#mergereruns > > > [2] https://github.com/gradle/gradle/issues/23324 > > > [3] https://github.com/apache/kafka/pull/14687 > > > > > > > > > On Wed, Nov 22, 2023 at 4:10 AM Ismael Juma <m...@ismaeljuma.com> wrote: > > > > > > > Hi, > > > > > > > > We have a dashboard already: > > > > > > > > [image: image.png] > > > > > > > > > > > > > > > > > > https://ge.apache.org/scans/tests?search.names=Git%20branch&search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.timeZoneId=America%2FLos_Angeles&search.values=trunk&tests.sortField=FLAKY > > > > > > > > On Tue, Nov 14, 2023 at 10:41 PM Николай Ижиков <nizhi...@apache.org > > > > > > wrote: > > > > > > > >> Hello guys. > > > >> > > > >> I want to tell you about one more approach to deal with flaky tests. > > > >> We adopt this approach in Apache Ignite community, so may be it can > be > > > >> helpful for Kafka, also. > > > >> > > > >> TL;DR: Apache Ignite community have a tool that provide a statistic > of > > > >> tests and can tell if PR introduces new failures. > > > >> > > > >> Apache Ignite has a many tests. > > > >> Latest «Run All» contains around 75k. > > > >> Most of test has integration style therefore count of flacky are > > > >> significant. > > > >> > > > >> We build a tool - Team City Bot [1] > > > >> That provides a combined statistic of flaky tests [2] > > > >> > > > >> This tool can compare results of Run All for PR and master. > > > >> If all OK one can comment jira ticket with a visa from bot [3] > > > >> > > > >> Visa is a quality proof of PR for Ignite committers. > > > >> And we can sort out most flaky tests and prioritize fixes with the > bot > > > >> statistic [2] > > > >> > > > >> TC bot integrated with the Team City only, for now. > > > >> But, if Kafka community interested we can try to integrate it with > > > >> Jenkins. > > > >> > > > >> [1] https://github.com/apache/ignite-teamcity-bot > > > >> [2] > > > https://tcbot2.sbt-ignite-dev.ru/current.html?branch=master&count=10 > > > >> [3] > > > >> > > > > > > https://issues.apache.org/jira/browse/IGNITE-19950?focusedCommentId=17767394&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17767394 > > > >> > > > >> > > > >> > > > >> > 15 нояб. 2023 г., в 09:18, Ismael Juma <m...@ismaeljuma.com> > > > написал(а): > > > >> > > > > >> > To use the pain analogy, people seem to have really good > painkillers > > > and > > > >> > hence they somehow don't feel the pain already. ;) > > > >> > > > > >> > The reality is that important and high quality tests will get > fixed. > > > >> Poor > > > >> > quality tests (low signal to noise ratio) might not get fixed and > > > >> that's ok. > > > >> > > > > >> > I'm not opposed to marking the tests as release blockers as a > > starting > > > >> > point, but I'm saying it's fine if people triage them and decide > > they > > > >> are > > > >> > not blockers. In fact, that has already happened in the past. > > > >> > > > > >> > Ismael > > > >> > > > > >> > On Tue, Nov 14, 2023 at 10:02 PM Matthias J. Sax < > mj...@apache.org> > > > >> wrote: > > > >> > > > > >> >> I agree on the test gap argument. However, my worry is, if we > don't > > > >> >> "force the pain", it won't get fixed at all. -- I also know, that > > we > > > >> try > > > >> >> to find an working approach for many years... > > > >> >> > > > >> >> My take is that if we disable a test and file a non-blocking > Jira, > > > it's > > > >> >> basically the same as just deleting the test all together and > never > > > >> talk > > > >> >> about it again. -- I believe, this is not want we aim for, but we > > aim > > > >> >> for good test coverage and a way to get these test fixed? > > > >> >> > > > >> >> Thus IMHO we need some forcing function (either keep the tests > and > > > feel > > > >> >> the pain on every PR), or disable the test and file a blocker > JIRA > > so > > > >> >> the pain surfaces on a release forcing us to do something about > it. > > > >> >> > > > >> >> If there is no forcing function, it basically means we are > willing > > to > > > >> >> accept test gaps forever. > > > >> >> > > > >> >> > > > >> >> -Matthias > > > >> >> > > > >> >> On 11/14/23 9:09 PM, Ismael Juma wrote: > > > >> >>> Matthias, > > > >> >>> > > > >> >>> Flaky tests are worse than useless. I know engineers find it > hard > > to > > > >> >>> disable them because of the supposed test gap (I find it hard > > too), > > > >> but > > > >> >> the > > > >> >>> truth is that the test gap is already there! No-one blocks > merging > > > >> PRs on > > > >> >>> flaky tests, but they do get used to ignoring build failures. > > > >> >>> > > > >> >>> The current approach has been attempted for nearly a decade and > it > > > has > > > >> >>> never worked. I think we should try something different. > > > >> >>> > > > >> >>> When it comes to marking flaky tests as release blockers, I > don't > > > >> think > > > >> >>> this should be done as a general rule. We should instead assess > > on a > > > >> case > > > >> >>> by case basis, same way we do it for bugs. > > > >> >>> > > > >> >>> Ismael > > > >> >>> > > > >> >>> On Tue, Nov 14, 2023 at 5:02 PM Matthias J. Sax < > mj...@apache.org > > > > > > >> >> wrote: > > > >> >>> > > > >> >>>> Thanks for starting this discussion David! I totally agree to > > "no"! > > > >> >>>> > > > >> >>>> I think there is no excuse whatsoever for merging PRs with > > > >> compilation > > > >> >>>> errors (except an honest mistake for conflicting PRs that got > > > merged > > > >> >>>> interleaved). -- Every committer must(!) check the Jenkins > status > > > >> before > > > >> >>>> merging to avoid such an issue. > > > >> >>>> > > > >> >>>> Similar for actual permanently broken tests. If there is no > green > > > >> build, > > > >> >>>> and the same test failed across multiple Jenkins runs, a > > committer > > > >> >>>> should detect this and cannot merge a PR. > > > >> >>>> > > > >> >>>> Given the current state of the CI pipeline, it seems possible > to > > > get > > > >> >>>> green runs, and thus I support the policy (that we actually > > always > > > >> had) > > > >> >>>> to only merge if there is at least one green build. If > committers > > > got > > > >> >>>> sloppy about this, we need to call it out and put a hold on > this > > > >> >> practice. > > > >> >>>> > > > >> >>>> (The only exception from the above policy would be a very > > unstable > > > >> >>>> status for which getting a green build is not possible at all, > > due > > > to > > > >> >>>> too many flaky tests -- for this case, I would accept to merge > > even > > > >> >>>> there is no green build, but committer need to manually ensure > > that > > > >> >>>> every test did pass in at least one test run. -- We had this in > > the > > > >> >>>> past, but we I don't think we are in such a bad situation right > > > now). > > > >> >>>> > > > >> >>>> About disabling tests: I was never a fan of this, because in my > > > >> >>>> experience those tests are not fixed any time soon. Especially, > > > >> because > > > >> >>>> we do not consider such tickets as release blockers. To me, > > seeing > > > >> tests > > > >> >>>> fails on PR build is actually a good forcing function for > people > > to > > > >> feel > > > >> >>>> the pain, and thus get motivated to make time to fix the tests. > > > >> >>>> > > > >> >>>> I have to admit that I was a little bit sloppy paying attention > > to > > > >> flaky > > > >> >>>> tests recently, and I highly appreciate this effort. Also > thanks > > to > > > >> >>>> everyone how actually filed a ticket! IMHO, we should file a > > ticket > > > >> for > > > >> >>>> every flaky test, and also keep adding comments each time we > see > > a > > > >> test > > > >> >>>> fail to be able to track the frequency at which a tests fails, > so > > > we > > > >> can > > > >> >>>> fix the most pressing ones first. > > > >> >>>> > > > >> >>>> Te me, the best forcing function to get test stabilized is to > > file > > > >> >>>> tickets and consider them release blockers. Disabling tests > does > > > not > > > >> >>>> really help much IMHO to tackle the problem (we can of course > > still > > > >> >>>> disable them to get noise out of the system, but it would only > > > >> introduce > > > >> >>>> testing gaps for the time being and also does not help to > figure > > > out > > > >> how > > > >> >>>> often a test fails, so it's not a solution to the problem IMHO) > > > >> >>>> > > > >> >>>> > > > >> >>>> -Matthias > > > >> >>>> > > > >> >>>> On 11/13/23 11:40 PM, Sagar wrote: > > > >> >>>>> Hi Divij, > > > >> >>>>> > > > >> >>>>> I think this proposal overall makes sense. My only nit sort > of a > > > >> >>>> suggestion > > > >> >>>>> is that let's also consider a label called newbie++[1] for > flaky > > > >> tests > > > >> >> if > > > >> >>>>> we are considering adding newbie as a label. I think some of > the > > > >> flaky > > > >> >>>>> tests need familiarity with the codebase or the test setup so > > as a > > > >> >> first > > > >> >>>>> time contributor, it might be difficult. newbie++ IMO covers > > that > > > >> >> aspect. > > > >> >>>>> > > > >> >>>>> [1] > > > >> >>>>> > > > >> >>>> > > > >> >> > > > >> > > > > > > https://issues.apache.org/jira/browse/KAFKA-15406?jql=project%20%3D%20KAFKA%20AND%20labels%20%3D%20%22newbie%2B%2B%22 > > > >> >>>>> > > > >> >>>>> Let me know what you think. > > > >> >>>>> > > > >> >>>>> Thanks! > > > >> >>>>> Sagar. > > > >> >>>>> > > > >> >>>>> On Mon, Nov 13, 2023 at 9:11 PM Divij Vaidya < > > > >> divijvaidy...@gmail.com> > > > >> >>>>> wrote: > > > >> >>>>> > > > >> >>>>>>> Please, do it. > > > >> >>>>>> We can use specific labels to effectively filter those > tickets. > > > >> >>>>>> > > > >> >>>>>> We already have a label and a way to discover flaky tests. > They > > > are > > > >> >>>> tagged > > > >> >>>>>> with the label "flaky-test" [1]. There is also a label > "newbie" > > > [2] > > > >> >>>> meant > > > >> >>>>>> for folks who are new to Apache Kafka code base. > > > >> >>>>>> My suggestion is to send a broader email to the community > > (since > > > >> many > > > >> >>>> will > > > >> >>>>>> miss details in this thread) and call for action for > committers > > > to > > > >> >>>>>> volunteer as "shepherds" for these tickets. I can send one > out > > > >> once we > > > >> >>>> have > > > >> >>>>>> some consensus wrt next steps in this thread. > > > >> >>>>>> > > > >> >>>>>> > > > >> >>>>>> [1] > > > >> >>>>>> > > > >> >>>>>> > > > >> >>>> > > > >> >> > > > >> > > > > > > https://issues.apache.org/jira/browse/KAFKA-13421?jql=project%20%3D%20KAFKA%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flaky-test%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC > > > >> >>>>>> > > > >> >>>>>> > > > >> >>>>>> [2] https://kafka.apache.org/contributing -> Finding a > project > > > to > > > >> >> work > > > >> >>>> on > > > >> >>>>>> > > > >> >>>>>> > > > >> >>>>>> Divij Vaidya > > > >> >>>>>> > > > >> >>>>>> > > > >> >>>>>> > > > >> >>>>>> On Mon, Nov 13, 2023 at 4:24 PM Николай Ижиков < > > > >> nizhi...@apache.org> > > > >> >>>>>> wrote: > > > >> >>>>>> > > > >> >>>>>>> > > > >> >>>>>>>> To kickstart this effort, we can publish a list of such > > tickets > > > >> in > > > >> >> the > > > >> >>>>>>> community and assign one or more committers the role of a > > > >> «shepherd" > > > >> >>>> for > > > >> >>>>>>> each ticket. > > > >> >>>>>>> > > > >> >>>>>>> Please, do it. > > > >> >>>>>>> We can use specific label to effectively filter those > tickets. > > > >> >>>>>>> > > > >> >>>>>>>> 13 нояб. 2023 г., в 15:16, Divij Vaidya < > > > divijvaidy...@gmail.com > > > >> > > > > >> >>>>>>> написал(а): > > > >> >>>>>>>> > > > >> >>>>>>>> Thanks for bringing this up David. > > > >> >>>>>>>> > > > >> >>>>>>>> My primary concern revolves around the possibility that the > > > >> >> currently > > > >> >>>>>>>> disabled tests may remain inactive indefinitely. We > currently > > > >> have > > > >> >>>>>>>> unresolved JIRA tickets for flaky tests that have been > > pending > > > >> for > > > >> >> an > > > >> >>>>>>>> extended period. I am inclined to support the idea of > > disabling > > > >> >> these > > > >> >>>>>>> tests > > > >> >>>>>>>> temporarily and merging changes only when the build is > > > >> successful, > > > >> >>>>>>> provided > > > >> >>>>>>>> there is a clear plan for re-enabling them in the future. > > > >> >>>>>>>> > > > >> >>>>>>>> To address this issue, I propose the following measures: > > > >> >>>>>>>> > > > >> >>>>>>>> 1\ Foster a supportive environment for new contributors > > within > > > >> the > > > >> >>>>>>>> community, encouraging them to take on tickets associated > > with > > > >> flaky > > > >> >>>>>>> tests. > > > >> >>>>>>>> This initiative would require individuals familiar with the > > > >> relevant > > > >> >>>>>> code > > > >> >>>>>>>> to offer guidance to those undertaking these tasks. > > Committers > > > >> >> should > > > >> >>>>>>>> prioritize reviewing and addressing these tickets within > > their > > > >> >>>>>> available > > > >> >>>>>>>> bandwidth. To kickstart this effort, we can publish a list > of > > > >> such > > > >> >>>>>>> tickets > > > >> >>>>>>>> in the community and assign one or more committers the role > > of > > > a > > > >> >>>>>>> "shepherd" > > > >> >>>>>>>> for each ticket. > > > >> >>>>>>>> > > > >> >>>>>>>> 2\ Implement a policy to block minor version releases until > > the > > > >> >>>> Release > > > >> >>>>>>>> Manager (RM) is satisfied that the disabled tests do not > > result > > > >> in > > > >> >>>> gaps > > > >> >>>>>>> in > > > >> >>>>>>>> our testing coverage. The RM may rely on Subject Matter > > Experts > > > >> >> (SMEs) > > > >> >>>>>> in > > > >> >>>>>>>> the specific code areas to provide assurance before giving > > the > > > >> green > > > >> >>>>>>> light > > > >> >>>>>>>> for a release. > > > >> >>>>>>>> > > > >> >>>>>>>> 3\ Set a community-wide goal for 2024 to achieve a stable > > > >> Continuous > > > >> >>>>>>>> Integration (CI) system. This goal should encompass > projects > > > >> such as > > > >> >>>>>>>> refining our test suite to eliminate flakiness and > addressing > > > >> >>>>>>>> infrastructure issues if necessary. By publishing this > goal, > > we > > > >> >> create > > > >> >>>>>> a > > > >> >>>>>>>> shared vision for the community in 2024, fostering > alignment > > on > > > >> our > > > >> >>>>>>>> objectives. This alignment will aid in prioritizing tasks > for > > > >> >>>> community > > > >> >>>>>>>> members and guide reviewers in allocating their bandwidth > > > >> >> effectively. > > > >> >>>>>>>> > > > >> >>>>>>>> -- > > > >> >>>>>>>> Divij Vaidya > > > >> >>>>>>>> > > > >> >>>>>>>> > > > >> >>>>>>>> > > > >> >>>>>>>> On Sun, Nov 12, 2023 at 2:58 AM Justine Olshan > > > >> >>>>>>> <jols...@confluent.io.invalid> > > > >> >>>>>>>> wrote: > > > >> >>>>>>>> > > > >> >>>>>>>>> I will say that I have also seen tests that seem to be > more > > > >> flaky > > > >> >>>>>>>>> intermittently. It may be ok for some time and suddenly > the > > CI > > > >> is > > > >> >>>>>>>>> overloaded and we see issues. > > > >> >>>>>>>>> I have also seen the CI struggling with running out of > space > > > >> >>>> recently, > > > >> >>>>>>> so I > > > >> >>>>>>>>> wonder if we can also try to improve things on that front. > > > >> >>>>>>>>> > > > >> >>>>>>>>> FWIW, I noticed, filed, or commented on several flaky test > > > JIRAs > > > >> >> last > > > >> >>>>>>> week. > > > >> >>>>>>>>> I'm happy to try to get to green builds, but everyone > needs > > to > > > >> be > > > >> >> on > > > >> >>>>>>> board. > > > >> >>>>>>>>> > > > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15529 > > > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-14806 > > > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-14249 > > > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15798 > > > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15797 > > > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15690 > > > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15699 > > > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15772 > > > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15759 > > > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15760 > > > >> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-15700 > > > >> >>>>>>>>> > > > >> >>>>>>>>> I've also seen that kraft transactions tests often flakily > > see > > > >> that > > > >> >>>>>> the > > > >> >>>>>>>>> producer id is not allocated and times out. > > > >> >>>>>>>>> I can file a JIRA for that too. > > > >> >>>>>>>>> > > > >> >>>>>>>>> Hopefully this is a place we can start from. > > > >> >>>>>>>>> > > > >> >>>>>>>>> Justine > > > >> >>>>>>>>> > > > >> >>>>>>>>> On Sat, Nov 11, 2023 at 11:35 AM Ismael Juma < > > > m...@ismaeljuma.com > > > >> > > > > >> >>>>>> wrote: > > > >> >>>>>>>>> > > > >> >>>>>>>>>> On Sat, Nov 11, 2023 at 10:32 AM John Roesler < > > > >> >> vvcep...@apache.org> > > > >> >>>>>>>>> wrote: > > > >> >>>>>>>>>> > > > >> >>>>>>>>>>> In other words, I’m biased to think that new flakiness > > > >> indicates > > > >> >>>>>>>>>>> non-deterministic bugs more often than it indicates a > bad > > > >> test. > > > >> >>>>>>>>>>> > > > >> >>>>>>>>>> > > > >> >>>>>>>>>> My experience is exactly the opposite. As someone who has > > > >> tracked > > > >> >>>>>> many > > > >> >>>>>>> of > > > >> >>>>>>>>>> the flaky fixes, the vast majority of the time they are > an > > > >> issue > > > >> >>>> with > > > >> >>>>>>> the > > > >> >>>>>>>>>> test. > > > >> >>>>>>>>>> > > > >> >>>>>>>>>> Ismael > > > >> >>>>>>>>>> > > > >> >>>>>>>>> > > > >> >>>>>>> > > > >> >>>>>>> > > > >> >>>>>> > > > >> >>>>> > > > >> >>>> > > > >> >>> > > > >> >> > > > >> > > > >> > > > > > >