Hey Николай, Apologies about this - I wasn't aware of this behavior. I have made all the gists public.
On Wed, Dec 20, 2023 at 12:09 AM Greg Harris <greg.har...@aiven.io.invalid> wrote: > Hey Stan, > > Thanks for opening the discussion. I haven't been looking at overall > build duration recently, so it's good that you are calling it out. > > I worry about us over-indexing on this one build, which itself appears > to be an outlier. I only see one other build [1] above 6h overall in > the last 90 days in this view: [2] > And I don't see any overlap of failed tests in these two builds, which > makes it less likely that these particular failed tests are the causes > of long build times. > > Separately, I've been investigating build environment slowness, and > trying to connect it with test failures [3]. I observed that the CI > build environment is 2-20 times slower than my developer machine (M1 > mac). > When I simulate a similar slowdown locally, there are tests which > become significantly more flakey, often due to hard-coded timeouts. > I think that these particularly nasty builds could be explained by > long-tail slowdowns causing arbitrary tests to take an excessive time > to execute. > > Rather than trying to find signals in these rare test failures, I > think we should find tests that have these sorts of failures more > regularly. > There are lots of builds in the 5-6h duration bracket, which is > certainly unacceptably long. We should look into these builds to find > improvements and optimizations. > > [1] https://ge.apache.org/s/ygh4gbz4uma6i/ > [2] > https://ge.apache.org/scans?list.sortColumn=buildDuration&search.relativeStartTime=P90D&search.rootProjectNames=kafka&search.tags=trunk&search.timeZoneId=America%2FNew_York > [3] https://github.com/apache/kafka/pull/15008 > > Thanks for looking into this! > Greg > > On Tue, Dec 19, 2023 at 3:45 PM Николай Ижиков <nizhi...@apache.org> > wrote: > > > > Hello, Stanislav. > > > > Can you, please, make the gist public. > > Private gists not available for some GitHub users even if link are known. > > > > > 19 дек. 2023 г., в 17:33, Stanislav Kozlovski > > > <stanis...@confluent.io.INVALID> > написал(а): > > > > > > Hey everybody, > > > I've heard various complaints that build times in trunk are taking too > > > long, some taking as much as 8 hours (the timeout) - and this is > slowing us > > > down from being able to meet the code freeze deadline for 3.7. > > > > > > I took it upon myself to gather up some data in Gradle Enterprise to > see if > > > there are any outlier tests that are causing this slowness. Turns out > there > > > are a few, in this particular build - > https://ge.apache.org/s/un2hv7n6j374k/ > > > - which took 10 hours and 29 minutes in total. > > > > > > I have compiled the tests that took a disproportionately large amount > of > > > time (20m+), alongside their time, error message and a link to their > full > > > log output here - > > > > https://gist.github.com/stanislavkozlovski/8959f7ee59434f774841f4ae2f5228c2 > > > > > > It includes failures from core, streams, storage and clients. > > > Interestingly, some other tests that don't fail also take a long time > in > > > what is apparently the test harness framework. See the gist for more > > > information. > > > > > > I am starting this thread with the intention of getting the discussion > > > started and brainstorming what we can do to get the build times back > under > > > control. > > > > > > > > > -- > > > Best, > > > Stanislav > > > -- Best, Stanislav