Sorry, but to put it bluntly, the current build setup isn't good enough at 
partial rebuilds that build caching would make sense. All Kafka devs have had 
the experience of needing to clean the build directory in order to get a valid 
build. The scala code esspecially seems to have this issue.

regards,
Colin


On Tue, Jan 2, 2024, at 07:00, Nick Telford wrote:
> Addendum: I've opened a PR with what I believe are the changes necessary to
> enable Remote Build Caching, if you choose to go that route:
> https://github.com/apache/kafka/pull/15109
>
> On Tue, 2 Jan 2024 at 14:31, Nick Telford <nick.telf...@gmail.com> wrote:
>
>> Hi everyone,
>>
>> Regarding building a "dependency graph"... Gradle already has this
>> information, albeit fairly coarse-grained. You might be able to get some
>> considerable improvement by configuring the Gradle Remote Build Cache. It
>> looks like it's currently disabled explicitly:
>> https://github.com/apache/kafka/blob/trunk/settings.gradle#L46
>>
>> The trick is to have trunk builds write to the cache, and PR builds only
>> read from it. This way, any PR based on trunk should be able to cache not
>> only the compilation, but also the tests from dependent modules that
>> haven't changed (e.g. for a PR that only touches the connect/streams
>> modules).
>>
>> This would probably be preferable to having to hand-maintain some
>> rules/dependency graph in the CI configuration, and it's quite
>> straight-forward to configure.
>>
>> Bonus points if the Remote Build Cache is readable publicly, enabling
>> contributors to benefit from it locally.
>>
>> Regards,
>> Nick
>>
>> On Tue, 2 Jan 2024 at 13:00, Lucas Brutschy <lbruts...@confluent.io.invalid>
>> wrote:
>>
>>> Thanks for all the work that has already been done on this in the past
>>> days!
>>>
>>> Have we considered running our test suite with
>>> -XX:+HeapDumpOnOutOfMemoryError and uploading the heap dumps as
>>> Jenkins build artifacts? This could speed up debugging. Even if we
>>> store them only for a day and do it only for trunk, I think it could
>>> be worth it. The heap dumps shouldn't contain any secrets, and I
>>> checked with the ASF infra team, and they are not concerned about the
>>> additional disk usage.
>>>
>>> Cheers,
>>> Lucas
>>>
>>> On Wed, Dec 27, 2023 at 2:25 PM Divij Vaidya <divijvaidy...@gmail.com>
>>> wrote:
>>> >
>>> > I have started to perform an analysis of the OOM at
>>> > https://issues.apache.org/jira/browse/KAFKA-16052. Please feel free to
>>> > contribute to the investigation.
>>> >
>>> > --
>>> > Divij Vaidya
>>> >
>>> >
>>> >
>>> > On Wed, Dec 27, 2023 at 1:23 AM Justine Olshan
>>> <jols...@confluent.io.invalid>
>>> > wrote:
>>> >
>>> > > I am still seeing quite a few OOM errors in the builds and I was
>>> curious if
>>> > > folks had any ideas on how to identify the cause and fix the issue. I
>>> was
>>> > > looking in gradle enterprise and found some info about memory usage,
>>> but
>>> > > nothing detailed enough to help figure the issue out.
>>> > >
>>> > > OOMs sometimes fail the build immediately and in other cases I see it
>>> get
>>> > > stuck for 8 hours. (See
>>> > >
>>> > >
>>> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka/detail/trunk/2508/pipeline/12
>>> > > )
>>> > >
>>> > > I appreciate all the work folks are doing here and I will continue to
>>> try
>>> > > to help as best as I can.
>>> > >
>>> > > Justine
>>> > >
>>> > > On Tue, Dec 26, 2023 at 1:04 PM David Arthur
>>> > > <david.art...@confluent.io.invalid> wrote:
>>> > >
>>> > > > S2. We’ve looked into this before, and it wasn’t possible at the
>>> time
>>> > > with
>>> > > > JUnit. We commonly set a timeout on each test class (especially
>>> > > integration
>>> > > > tests). It is probably worth looking at this again and seeing if
>>> > > something
>>> > > > has changed with JUnit (or our usage of it) that would allow a
>>> global
>>> > > > timeout.
>>> > > >
>>> > > >
>>> > > > S3. Dedicated infra sounds nice, if we can get it. It would at least
>>> > > remove
>>> > > > some variability between the builds, and hopefully eliminate the
>>> > > > infra/setup class of failures.
>>> > > >
>>> > > >
>>> > > > S4. Running tests for what has changed sounds nice, but I think it
>>> is
>>> > > risky
>>> > > > to implement broadly. As Sophie mentioned, there are probably some
>>> lines
>>> > > we
>>> > > > could draw where we feel confident that only running a subset of
>>> tests is
>>> > > > safe. As a start, we could probably work towards skipping CI for
>>> non-code
>>> > > > PRs.
>>> > > >
>>> > > >
>>> > > > ---
>>> > > >
>>> > > >
>>> > > > As an aside, I experimented with build caching and running affected
>>> > > tests a
>>> > > > few months ago. I used the opportunity to play with Github Actions,
>>> and I
>>> > > > quite liked it. Here’s the workflow I used:
>>> > > >
>>> https://github.com/mumrah/kafka/blob/trunk/.github/workflows/push.yml. I
>>> > > > was trying to see if we could use a build cache to reduce the
>>> compilation
>>> > > > time on PRs. A nightly/periodic job would build trunk and populate a
>>> > > Gradle
>>> > > > build cache. PR builds would read from that cache which would
>>> enable them
>>> > > > to only compile changed code. The same idea could be extended to
>>> tests,
>>> > > but
>>> > > > I didn’t get that far.
>>> > > >
>>> > > >
>>> > > > As for Github Actions, the idea there is that ASF would provide
>>> generic
>>> > > > Action “runners” that would pick up jobs from the Github Action
>>> build
>>> > > queue
>>> > > > and run them. It is also possible to self-host runners to expand the
>>> > > build
>>> > > > capacity of the project (i.e., other organizations could donate
>>> > > > build capacity). The advantage of this is that we would have more
>>> control
>>> > > > over our build/reports and not be “stuck” with whatever ASF Jenkins
>>> > > offers.
>>> > > > The Actions workflows are very customizable and it would let us
>>> create
>>> > > our
>>> > > > own custom plugins. There is also a substantial marketplace of
>>> plugins. I
>>> > > > think it’s worth exploring this more, I just haven’t had time
>>> lately.
>>> > > >
>>> > > > On Tue, Dec 26, 2023 at 3:24 PM Sophie Blee-Goldman <
>>> > > sop...@responsive.dev
>>> > > > >
>>> > > > wrote:
>>> > > >
>>> > > > > Regarding:
>>> > > > >
>>> > > > > S-4. Separate tests ran depending on what module is changed.
>>> > > > > >
>>> > > > > - This makes sense although is tricky to implement successfully,
>>> as
>>> > > > > > unrelated tests may expose problems in an unrelated change (e.g
>>> > > > changing
>>> > > > > > core stuff like clients, the server, etc)
>>> > > > >
>>> > > > >
>>> > > > > Imo this avenue could provide a massive improvement to dev
>>> productivity
>>> > > > > with very little effort or investment, and if we do it right,
>>> without
>>> > > > even
>>> > > > > any risk. We should be able to draft a simple dependency graph
>>> between
>>> > > > > modules and then skip the tests for anything that is clearly,
>>> provably
>>> > > > > unrelated and/or upstream of the target changes. This has the
>>> potential
>>> > > > to
>>> > > > > substantially speed up and improve the developer experience in
>>> modules
>>> > > at
>>> > > > > the end of the dependency graph, which I believe is worth doing
>>> even if
>>> > > > it
>>> > > > > unfortunately would not benefit everyone equally.
>>> > > > >
>>> > > > > For example, we can save a lot of grief with just a simple set of
>>> rules
>>> > > > > that are easy to check. I'll throw out a few to start with:
>>> > > > >
>>> > > > >    1. A pure docs PR (ie that only touches files under the docs/
>>> > > > directory)
>>> > > > >    should be allowed to skip the tests of all modules
>>> > > > >    2. Connect PRs (that only touch connect/) only need to run the
>>> > > Connect
>>> > > > >    tests -- ie they can skip the tests for core, clients,
>>> streams, etc
>>> > > > >    3. Similarly, Streams PRs should only need to run the Streams
>>> tests
>>> > > --
>>> > > > >    but again, only if all the changes are contained within
>>> streams/
>>> > > > >
>>> > > > > I'll let others chime in on how or if we can construct some safe
>>> rules
>>> > > as
>>> > > > > to which modules can or can't be skipped between the core,
>>> clients,
>>> > > raft,
>>> > > > > storage, etc
>>> > > > >
>>> > > > > And over time we could in theory build up a literal dependency
>>> graph
>>> > > on a
>>> > > > > more granular level so that, for example, changes to the
>>> core/storage
>>> > > > > module are allowed to skip any Streams tests that don't use an
>>> embedded
>>> > > > > broker, ie all unit tests and TopologyTestDriver-based integration
>>> > > tests.
>>> > > > > The danger here would be in making sure this graph is kept up to
>>> date
>>> > > as
>>> > > > > tests are added and changed, but my point is just that there's a
>>> way to
>>> > > > > extend the benefit of this tactic to those who work primarily on
>>> the
>>> > > core
>>> > > > > module as well. Personally, I think we should just start out with
>>> the
>>> > > > > example ruleset listed above, workshop it a bit since there might
>>> be
>>> > > > other
>>> > > > > obvious rules I left out, and try to implement it.
>>> > > > >
>>> > > > > Thoughts?
>>> > > > >
>>> > > > > On Tue, Dec 26, 2023 at 2:25 AM Stanislav Kozlovski
>>> > > > > <stanis...@confluent.io.invalid> wrote:
>>> > > > >
>>> > > > > > Great discussion!
>>> > > > > >
>>> > > > > >
>>> > > > > > Greg, that was a good call out regarding the two long-running
>>> > > builds. I
>>> > > > > > missed that 90d view.
>>> > > > > >
>>> > > > > > My takeaway from that is that our average build time for tests
>>> is
>>> > > > between
>>> > > > > > 3-4 hours. Which in of itself seems large.
>>> > > > > >
>>> > > > > > But then reconciling this with Sophie's statement - is it
>>> possible
>>> > > that
>>> > > > > > these timed-out 8-hour builds don't get captured in that view?
>>> > > > > >
>>> > > > > > It is weird that people are reporting these things and Gradle
>>> > > > Enterprise
>>> > > > > > isn't showing them.
>>> > > > > >
>>> > > > > > ---
>>> > > > > >
>>> > > > > > > I think that these particularly nasty builds could be
>>> explained by
>>> > > > > > long-tail slowdowns causing arbitrary tests to take an
>>> excessive time
>>> > > > to
>>> > > > > > execute.
>>> > > > > >
>>> > > > > > I'm not sure I understood that. If the tests have timeouts,
>>> where
>>> > > would
>>> > > > > the
>>> > > > > > slowdown come from? Problems in tearing down the test?
>>> > > > > >
>>> > > > > > ---
>>> > > > > >
>>> > > > > > David, thanks for the great work in identifying and even fixing
>>> those
>>> > > > two
>>> > > > > > top offenders! And thank you for cherry-picking to 3.7
>>> > > > > >
>>> > > > > > --
>>> > > > > >
>>> > > > > > All in all, from this thread I can summarize a few potential
>>> > > solutions:
>>> > > > > >
>>> > > > > > S-1. Dedicated work identifying and fixing some of the issues
>>> (e.g.
>>> > > > what
>>> > > > > > David did).
>>> > > > > > - Should help alleviate the issues as it can be speculated that
>>> it's
>>> > > > > > frequently 1 or 2 tests causing the majority of issues.
>>> > > > > > - With regards to that, KAFKA-16045 seems open for taking if
>>> there
>>> > > are
>>> > > > > any
>>> > > > > > volunteers
>>> > > > > > - Sophie's list also contains good candidates
>>> > > > > >
>>> > > > > > S-2. Global 10-minute timeout for tests.
>>> > > > > > - Should lay the foundation for a strong catch-all for any
>>> > > misbehaving
>>> > > > > > tests. I like this idea since it's guaranteed to save each
>>> > > contributor
>>> > > > > many
>>> > > > > > hours of waiting for an 8hr+ time out build.
>>> > > > > > - Luke already has a PR out for this:
>>> > > > > > https://github.com/apache/kafka/pull/15065
>>> > > > > >
>>> > > > > > S-3. Separate infrastructure for our CI
>>> > > > > > - This would help with Greg's comment about the developer
>>> machine
>>> > > being
>>> > > > > > 2-20 times faster than the CI.
>>> > > > > > - Requires volunteer funding from external companies. If every
>>> > > > > contributor
>>> > > > > > would bring up the idea with their employer, we may be able to
>>> stitch
>>> > > > > > something together.
>>> > > > > >
>>> > > > > > S-4. Separate tests ran depending on what module is changed.
>>> > > > > > - This makes sense although is tricky to implement
>>> successfully, as
>>> > > > > > unrelated tests may expose problems in an unrelated change (e.g
>>> > > > changing
>>> > > > > > core stuff like clients, the server, etc)
>>> > > > > >
>>> > > > > > S-5. Greater committer diligence when merging PRs
>>> > > > > > - This should always be there. Unfortunately it is a bit of a
>>> > > > > > self-perpetuating effect in that when the builds get worse,
>>> people
>>> > > are
>>> > > > > > incentivized to be less diligent (slowed down while in a rush to
>>> > > merge,
>>> > > > > > recency bias of failed builds, etc.)
>>> > > > > >
>>> > > > > > On Fri, Dec 22, 2023 at 4:16 PM Justine Olshan
>>> > > > > > <jols...@confluent.io.invalid>
>>> > > > > > wrote:
>>> > > > > >
>>> > > > > > > Thanks David! I think this should help a lot!
>>> > > > > > >
>>> > > > > > > While we should include these improvements, I think it is
>>> also good
>>> > > > to
>>> > > > > > > remind folks that a lot of these issues come from merging on
>>> builds
>>> > > > > that
>>> > > > > > > regress the CI.
>>> > > > > > > I know I'm not perfect at this (and have merged on flaky and
>>> > > failing
>>> > > > > > > tests), but let's all be super careful going forward. There
>>> were a
>>> > > > few
>>> > > > > > > times I retried the build 10+ times and thought it was other
>>> issues
>>> > > > > with
>>> > > > > > > the CI but the failed builds were actually due to the changes
>>> I
>>> > > > > wrote/was
>>> > > > > > > reviewing.
>>> > > > > > >
>>> > > > > > > We all need to work together on this to ensure the builds stay
>>> > > > healthy!
>>> > > > > > > Thanks all for being concerned about our builds!
>>> > > > > > >
>>> > > > > > > Justine
>>> > > > > > >
>>> > > > > > > On Fri, Dec 22, 2023 at 6:02 AM David Jacot <
>>> david.ja...@gmail.com
>>> > > >
>>> > > > > > wrote:
>>> > > > > > >
>>> > > > > > > > I just merged both PRs.
>>> > > > > > > >
>>> > > > > > > > Cheers,
>>> > > > > > > > David
>>> > > > > > > >
>>> > > > > > > > Le ven. 22 déc. 2023 à 14:38, David Jacot <
>>> david.ja...@gmail.com
>>> > > >
>>> > > > a
>>> > > > > > > écrit
>>> > > > > > > > :
>>> > > > > > > >
>>> > > > > > > > > Hey folks,
>>> > > > > > > > >
>>> > > > > > > > > I believe that my two PRs will fix most of the issues. I
>>> have
>>> > > > also
>>> > > > > > > > tweaked
>>> > > > > > > > > the configuration of Jenkins to fix the issues relating to
>>> > > > cloning
>>> > > > > > the
>>> > > > > > > > > repo. There may be other issues but the overall situation
>>> > > should
>>> > > > be
>>> > > > > > > much
>>> > > > > > > > > better when I merge those two.
>>> > > > > > > > >
>>> > > > > > > > > I will update this thread when I merge them.
>>> > > > > > > > >
>>> > > > > > > > > Cheers,
>>> > > > > > > > > David
>>> > > > > > > > >
>>> > > > > > > > > Le ven. 22 déc. 2023 à 14:22, Divij Vaidya <
>>> > > > > divijvaidy...@gmail.com>
>>> > > > > > a
>>> > > > > > > > > écrit :
>>> > > > > > > > >
>>> > > > > > > > >> Hey folks
>>> > > > > > > > >>
>>> > > > > > > > >> I think David (dajac) has some fixes lined-up to improve
>>> CI
>>> > > such
>>> > > > > as
>>> > > > > > > > >> https://github.com/apache/kafka/pull/15063 and
>>> > > > > > > > >> https://github.com/apache/kafka/pull/15062.
>>> > > > > > > > >>
>>> > > > > > > > >> I have some bandwidth for the next two days to work on
>>> fixing
>>> > > > the
>>> > > > > > CI.
>>> > > > > > > > Let
>>> > > > > > > > >> me start by taking a look at the list that Sophie shared
>>> here.
>>> > > > > > > > >>
>>> > > > > > > > >> --
>>> > > > > > > > >> Divij Vaidya
>>> > > > > > > > >>
>>> > > > > > > > >>
>>> > > > > > > > >>
>>> > > > > > > > >> On Fri, Dec 22, 2023 at 2:05 PM Luke Chen <
>>> show...@gmail.com>
>>> > > > > > wrote:
>>> > > > > > > > >>
>>> > > > > > > > >> > Hi Sophie and Philip and all,
>>> > > > > > > > >> >
>>> > > > > > > > >> > I share the same pain as you.
>>> > > > > > > > >> > I've been waiting for a CI build result in a PR for
>>> days.
>>> > > > > > > > >> Unfortunately, I
>>> > > > > > > > >> > can only get 1 result each day because it takes 8
>>> hours for
>>> > > > each
>>> > > > > > > run,
>>> > > > > > > > >> and
>>> > > > > > > > >> > with failed results. :(
>>> > > > > > > > >> >
>>> > > > > > > > >> > I've looked into the 8 hour timeout build issue and
>>> would
>>> > > like
>>> > > > > to
>>> > > > > > > > >> propose
>>> > > > > > > > >> > to set a global test timeout as 10 mins using the
>>> junit5
>>> > > > feature
>>> > > > > > > > >> > <
>>> > > > > > > > >> >
>>> > > > > > > > >>
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> https://junit.org/junit5/docs/current/user-guide/#writing-tests-declarative-timeouts-default-timeouts
>>> > > > > > > > >> > >
>>> > > > > > > > >> > .
>>> > > > > > > > >> > This way, we can fail those long running tests quickly
>>> > > without
>>> > > > > > > > impacting
>>> > > > > > > > >> > other tests.
>>> > > > > > > > >> > PR: https://github.com/apache/kafka/pull/15065
>>> > > > > > > > >> > I've tested in my local environment and it works as
>>> > > expected.
>>> > > > > > > > >> >
>>> > > > > > > > >> > Any feedback is welcome.
>>> > > > > > > > >> >
>>> > > > > > > > >> > Thanks.
>>> > > > > > > > >> > Luke
>>> > > > > > > > >> >
>>> > > > > > > > >> > On Fri, Dec 22, 2023 at 8:08 AM Philip Nee <
>>> > > > philip...@gmail.com
>>> > > > > >
>>> > > > > > > > wrote:
>>> > > > > > > > >> >
>>> > > > > > > > >> > > Hey Sophie - I've gotten 2 inflight PRs each with
>>> more
>>> > > than
>>> > > > 15
>>> > > > > > > > >> retries...
>>> > > > > > > > >> > > Namely: https://github.com/apache/kafka/pull/15023
>>> and
>>> > > > > > > > >> > > https://github.com/apache/kafka/pull/15035
>>> > > > > > > > >> > >
>>> > > > > > > > >> > > justin filed a flaky test report here though:
>>> > > > > > > > >> > > https://issues.apache.org/jira/browse/KAFKA-16045
>>> > > > > > > > >> > >
>>> > > > > > > > >> > > P
>>> > > > > > > > >> > >
>>> > > > > > > > >> > > On Thu, Dec 21, 2023 at 3:18 PM Sophie Blee-Goldman <
>>> > > > > > > > >> > sop...@responsive.dev
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > wrote:
>>> > > > > > > > >> > >
>>> > > > > > > > >> > > > On a related note, has anyone else had trouble
>>> getting
>>> > > > even
>>> > > > > a
>>> > > > > > > > single
>>> > > > > > > > >> > run
>>> > > > > > > > >> > > > with no build failures lately? I've had multiple
>>> > > pure-docs
>>> > > > > PRs
>>> > > > > > > > >> blocked
>>> > > > > > > > >> > > for
>>> > > > > > > > >> > > > days or even weeks because of miscellaneous infra,
>>> test,
>>> > > > and
>>> > > > > > > > timeout
>>> > > > > > > > >> > > > failures. I know we just had a discussion about
>>> whether
>>> > > > it's
>>> > > > > > > > >> acceptable
>>> > > > > > > > >> > > to
>>> > > > > > > > >> > > > ever merge with a failing build, and the consensus
>>> > > (which
>>> > > > I
>>> > > > > > > agree
>>> > > > > > > > >> with)
>>> > > > > > > > >> > > was
>>> > > > > > > > >> > > > NO -- but seriously, this is getting ridiculous.
>>> The
>>> > > build
>>> > > > > > might
>>> > > > > > > > be
>>> > > > > > > > >> the
>>> > > > > > > > >> > > > worst I've ever seen it, and it just makes it
>>> really
>>> > > > > difficult
>>> > > > > > > to
>>> > > > > > > > >> > > maintain
>>> > > > > > > > >> > > > good will with external contributors.
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > Take for example this small docs PR:
>>> > > > > > > > >> > > > https://github.com/apache/kafka/pull/14949
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > It's on its 7th replay, with the first 6 runs all
>>> having
>>> > > > (at
>>> > > > > > > > least)
>>> > > > > > > > >> one
>>> > > > > > > > >> > > > build that failed completely. The issues I saw on
>>> this
>>> > > one
>>> > > > > PR
>>> > > > > > > are
>>> > > > > > > > a
>>> > > > > > > > >> > good
>>> > > > > > > > >> > > > summary of what I've been seeing elsewhere, so
>>> here's
>>> > > the
>>> > > > > > > > briefing:
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > 1. gradle issue:
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > > * What went wrong:
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > > > Gradle could not start your build.
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > > > > Cannot create service of type
>>> > > > BuildSessionActionExecutor
>>> > > > > > > using
>>> > > > > > > > >> > method
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > >
>>> > > > > > > > >>
>>> > > > > > >
>>> > > > >
>>> > >
>>> LauncherServices$ToolingBuildSessionScopeServices.createActionExecutor()
>>> > > > > > > > >> > > > as
>>> > > > > > > > >> > > > > there is a problem with parameter #21 of type
>>> > > > > > > > >> > > > FileSystemWatchingInformation.
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > > >    > Cannot create service of type
>>> > > > > > > > >> > BuildLifecycleAwareVirtualFileSystem
>>> > > > > > > > >> > > > > using method
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > >
>>> > > > > > > > >> >
>>> > > > > > > > >>
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> VirtualFileSystemServices$GradleUserHomeServices.createVirtualFileSystem()
>>> > > > > > > > >> > > > > as there is a problem with parameter #7 of type
>>> > > > > > > > >> GlobalCacheLocations.
>>> > > > > > > > >> > > > >       > Cannot create service of type
>>> > > > GlobalCacheLocations
>>> > > > > > > using
>>> > > > > > > > >> > method
>>> > > > > > > > >> > > > >
>>> > > GradleUserHomeScopeServices.createGlobalCacheLocations()
>>> > > > > as
>>> > > > > > > > there
>>> > > > > > > > >> is
>>> > > > > > > > >> > a
>>> > > > > > > > >> > > > > problem with parameter #1 of type
>>> List<GlobalCache>.
>>> > > > > > > > >> > > > >          > Could not create service of type
>>> > > > > > > > FileAccessTimeJournal
>>> > > > > > > > >> > using
>>> > > > > > > > >> > > > >
>>> > > > GradleUserHomeScopeServices.createFileAccessTimeJournal().
>>> > > > > > > > >> > > > >             > Timeout waiting to lock journal
>>> cache
>>> > > > > > > > >> > > > > (/home/jenkins/.gradle/caches/journal-1). It is
>>> > > > currently
>>> > > > > in
>>> > > > > > > use
>>> > > > > > > > >> by
>>> > > > > > > > >> > > > another
>>> > > > > > > > >> > > > > Gradle instance.
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > 2. git issue:
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > > ERROR: Error cloning remote repo 'origin'
>>> > > > > > > > >> > > > > hudson.plugins.git.GitException:
>>> java.io.IOException:
>>> > > > > Remote
>>> > > > > > > > call
>>> > > > > > > > >> on
>>> > > > > > > > >> > > > > builds43 failed
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > 3. storage test calling System.exit (I think)
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > > * What went wrong:
>>> > > > > > > > >> > > > >  Execution failed for task ':storage:test'.
>>> > > > > > > > >> > > > >  > Process 'Gradle Test Executor 73' finished
>>> with
>>> > > > > non-zero
>>> > > > > > > exit
>>> > > > > > > > >> > value
>>> > > > > > > > >> > > 1
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > >     This problem might be caused by incorrect test
>>> > > process
>>> > > > > > > > >> > configuration.
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > 4.  3/4 builds aborted suddenly for no clear reason
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > 5. 1 build was aborted, 1 build failed due to a
>>> > > gradle(?)
>>> > > > > > issue
>>> > > > > > > > >> with a
>>> > > > > > > > >> > > > storage test:
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > Failed to map supported failure
>>> > > > > > > > >> 'org.opentest4j.AssertionFailedError:
>>> > > > > > > > >> > > > > Failed to observe commit callback before
>>> timeout' with
>>> > > > > > mapper
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > >
>>> > > > > > > > >> >
>>> > > > > > > > >>
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> 'org.gradle.api.internal.tasks.testing.failure.mappers.OpenTestAssertionFailedMapper@38bb78ea
>>> > > > > > > > >> > > > ':
>>> > > > > > > > >> > > > > null
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > * What went wrong:
>>> > > > > > > > >> > > > > Execution failed for task ':storage:test'.
>>> > > > > > > > >> > > > > > Process 'Gradle Test Executor 73' finished with
>>> > > > non-zero
>>> > > > > > > exit
>>> > > > > > > > >> > value 1
>>> > > > > > > > >> > > > >   This problem might be caused by incorrect test
>>> > > process
>>> > > > > > > > >> > configuration.
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > 6.  Unknown issue with a core test:
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > > Unexpected exception thrown.
>>> > > > > > > > >> > > > >
>>> > > org.gradle.internal.remote.internal.MessageIOException:
>>> > > > > > Could
>>> > > > > > > > not
>>> > > > > > > > >> > read
>>> > > > > > > > >> > > > > message from '/127.0.0.1:46952'.
>>> > > > > > > > >> > > > >   at
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > >
>>> > > > > > > > >> >
>>> > > > > > > > >>
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> org.gradle.internal.remote.internal.inet.SocketConnection.receive(SocketConnection.java:94)
>>> > > > > > > > >> > > > >   at
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > >
>>> > > > > > > > >> >
>>> > > > > > > > >>
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> org.gradle.internal.remote.internal.hub.MessageHub$ConnectionReceive.run(MessageHub.java:270)
>>> > > > > > > > >> > > > >   at
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > >
>>> > > > > > > > >> >
>>> > > > > > > > >>
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
>>> > > > > > > > >> > > > >   at
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > >
>>> > > > > > > > >> >
>>> > > > > > > > >>
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> org.gradle.internal.concurrent.AbstractManagedExecutor$1.run(AbstractManagedExecutor.java:47)
>>> > > > > > > > >> > > > >   at
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > >
>>> > > > > > > > >> >
>>> > > > > > > > >>
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
>>> > > > > > > > >> > > > >   at
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > >
>>> > > > > > > > >> >
>>> > > > > > > > >>
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
>>> > > > > > > > >> > > > >   at
>>> java.base/java.lang.Thread.run(Thread.java:1583)
>>> > > > > > > > >> > > > > Caused by: java.lang.IllegalArgumentException
>>> > > > > > > > >> > > > >   at
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > >
>>> > > > > > > > >> >
>>> > > > > > > > >>
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> org.gradle.internal.remote.internal.hub.InterHubMessageSerializer$MessageReader.read(InterHubMessageSerializer.java:72)
>>> > > > > > > > >> > > > >   at
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > >
>>> > > > > > > > >> >
>>> > > > > > > > >>
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> org.gradle.internal.remote.internal.hub.InterHubMessageSerializer$MessageReader.read(InterHubMessageSerializer.java:52)
>>> > > > > > > > >> > > > >   at
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > >
>>> > > > > > > > >> >
>>> > > > > > > > >>
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> org.gradle.internal.remote.internal.inet.SocketConnection.receive(SocketConnection.java:81)
>>> > > > > > > > >> > > > > ... 6 more
>>> > > > > > > > >> > > > >
>>> org.gradle.internal.remote.internal.ConnectException:
>>> > > > > Could
>>> > > > > > > not
>>> > > > > > > > >> > connect
>>> > > > > > > > >> > > > to
>>> > > > > > > > >> > > > > server [1d62bf97-6a3e-441d-93b6-093617cbbea9
>>> > > port:41289,
>>> > > > > > > > >> addresses:[/
>>> > > > > > > > >> > > > > 127.0.0.1]]. Tried addresses: [/127.0.0.1].
>>> > > > > > > > >> > > > >   at
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > >
>>> > > > > > > > >> >
>>> > > > > > > > >>
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:67)
>>> > > > > > > > >> > > > >   at
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > >
>>> > > > > > > > >> >
>>> > > > > > > > >>
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> org.gradle.internal.remote.internal.hub.MessageHubBackedClient.getConnection(MessageHubBackedClient.java:36)
>>> > > > > > > > >> > > > >   at
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > >
>>> > > > > > > > >> >
>>> > > > > > > > >>
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:103)
>>> > > > > > > > >> > > > >   at
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > >
>>> > > > > > > > >> >
>>> > > > > > > > >>
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:65)
>>> > > > > > > > >> > > > >   at
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > >
>>> > > > > > > > >> >
>>> > > > > > > > >>
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:69)
>>> > > > > > > > >> > > > >   at
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > >
>>> > > > > > > > >> >
>>> > > > > > > > >>
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74)
>>> > > > > > > > >> > > > > Caused by: java.net.ConnectException: Connection
>>> > > refused
>>> > > > > > > > >> > > > >   at java.base/sun.nio.ch.Net.pollConnect(Native
>>> > > > Method)
>>> > > > > > > > >> > > > >   at java.base/sun.nio.ch.Net
>>> > > > > .pollConnectNow(Net.java:682)
>>> > > > > > > > >> > > > >   at
>>> > > > > > > > >> > > > > java.base/sun.nio.ch
>>> > > > > > > > >> > > >
>>> > > > > > >
>>> .SocketChannelImpl.finishTimedConnect(SocketChannelImpl.java:1191)
>>> > > > > > > > >> > > > >   at
>>> > > > > > > > >> > > > > java.base/sun.nio.ch
>>> > > > > > > > >> > > >
>>> > > > > > .SocketChannelImpl.blockingConnect(SocketChannelImpl.java:1233)
>>> > > > > > > > >> > > > >   at java.base/sun.nio.ch
>>> > > > > > > > >> > > .SocketAdaptor.connect(SocketAdaptor.java:102)
>>> > > > > > > > >> > > > >   at
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > >
>>> > > > > > > > >> >
>>> > > > > > > > >>
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.tryConnect(TcpOutgoingConnector.java:81)
>>> > > > > > > > >> > > > >   at
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > >
>>> > > > > > > > >> >
>>> > > > > > > > >>
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:54)
>>> > > > > > > > >> > > > > ... 5 more
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > >  * What went wrong:
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > Execution failed for task ':core:test'.
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > > Process 'Gradle Test Executor 104' finished with
>>> > > > non-zero
>>> > > > > > exit
>>> > > > > > > > >> value
>>> > > > > > > > >> > 1
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > >   This problem might be caused by incorrect test
>>> process
>>> > > > > > > > >> configuration.
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > I've seen almost all of the above issues multiple
>>> times,
>>> > > > so
>>> > > > > it
>>> > > > > > > > might
>>> > > > > > > > >> > be a
>>> > > > > > > > >> > > > good list to start with to focus any efforts on
>>> > > improving
>>> > > > > the
>>> > > > > > > > build.
>>> > > > > > > > >> > That
>>> > > > > > > > >> > > > said, I'm not sure what we can really do about
>>> most of
>>> > > > > these,
>>> > > > > > > and
>>> > > > > > > > >> not
>>> > > > > > > > >> > > sure
>>> > > > > > > > >> > > > how to narrow down the root cause in the more
>>> mysterious
>>> > > > > cases
>>> > > > > > > of
>>> > > > > > > > >> > aborted
>>> > > > > > > > >> > > > builds and the builds that end with "finished with
>>> > > > non-zero
>>> > > > > > exit
>>> > > > > > > > >> value
>>> > > > > > > > >> > 1
>>> > > > > > > > >> > > "
>>> > > > > > > > >> > > > with no additional context (that I could find)
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > If nothing else, there seems to be something
>>> happening
>>> > > in
>>> > > > > one
>>> > > > > > > (or
>>> > > > > > > > >> more)
>>> > > > > > > > >> > > of
>>> > > > > > > > >> > > > the storage tests, because by far the most common
>>> > > failure
>>> > > > > I've
>>> > > > > > > > seen
>>> > > > > > > > >> is
>>> > > > > > > > >> > > that
>>> > > > > > > > >> > > > in 3 & 5. Unfortunately it's not really clear to
>>> me how
>>> > > to
>>> > > > > > tell
>>> > > > > > > > >> which
>>> > > > > > > > >> > is
>>> > > > > > > > >> > > > the offending test, so I'm not even sure what to
>>> file a
>>> > > > > ticket
>>> > > > > > > for
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > On Tue, Dec 19, 2023 at 11:55 PM David Jacot
>>> > > > > > > > >> > <dja...@confluent.io.invalid
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > wrote:
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > > > > The slowness of the CI is definitely causing us
>>> a lot
>>> > > of
>>> > > > > > > pain. I
>>> > > > > > > > >> > wonder
>>> > > > > > > > >> > > > if
>>> > > > > > > > >> > > > > we should move to a dedicated CI infrastructure
>>> for
>>> > > > Kafka.
>>> > > > > > Our
>>> > > > > > > > >> > > > integration
>>> > > > > > > > >> > > > > tests are quite heavy and ASF's CI is not really
>>> tuned
>>> > > > for
>>> > > > > > > them.
>>> > > > > > > > >> We
>>> > > > > > > > >> > > could
>>> > > > > > > > >> > > > > tune it for our needs and this would also allow
>>> > > external
>>> > > > > > > > >> companies to
>>> > > > > > > > >> > > > > sponsor more workers. I heard that we have a few
>>> cloud
>>> > > > > > > providers
>>> > > > > > > > >> in
>>> > > > > > > > >> > > > > the community ;). I think that we should consider
>>> > > this.
>>> > > > > What
>>> > > > > > > do
>>> > > > > > > > >> you
>>> > > > > > > > >> > > > think?
>>> > > > > > > > >> > > > > I already discussed this with the INFRA team. I
>>> could
>>> > > > > > continue
>>> > > > > > > > if
>>> > > > > > > > >> we
>>> > > > > > > > >> > > > > believe that it is a way forward.
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > > > Best,
>>> > > > > > > > >> > > > > David
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > > > On Wed, Dec 20, 2023 at 12:17 AM Stanislav
>>> Kozlovski
>>> > > > > > > > >> > > > > <stanis...@confluent.io.invalid> wrote:
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > > > > Hey Николай,
>>> > > > > > > > >> > > > > >
>>> > > > > > > > >> > > > > > Apologies about this - I wasn't aware of this
>>> > > > behavior.
>>> > > > > I
>>> > > > > > > have
>>> > > > > > > > >> made
>>> > > > > > > > >> > > all
>>> > > > > > > > >> > > > > the
>>> > > > > > > > >> > > > > > gists public.
>>> > > > > > > > >> > > > > >
>>> > > > > > > > >> > > > > >
>>> > > > > > > > >> > > > > >
>>> > > > > > > > >> > > > > > On Wed, Dec 20, 2023 at 12:09 AM Greg Harris
>>> > > > > > > > >> > > > > <greg.har...@aiven.io.invalid
>>> > > > > > > > >> > > > > > >
>>> > > > > > > > >> > > > > > wrote:
>>> > > > > > > > >> > > > > >
>>> > > > > > > > >> > > > > > > Hey Stan,
>>> > > > > > > > >> > > > > > >
>>> > > > > > > > >> > > > > > > Thanks for opening the discussion. I haven't
>>> been
>>> > > > > > looking
>>> > > > > > > at
>>> > > > > > > > >> > > overall
>>> > > > > > > > >> > > > > > > build duration recently, so it's good that
>>> you are
>>> > > > > > calling
>>> > > > > > > > it
>>> > > > > > > > >> > out.
>>> > > > > > > > >> > > > > > >
>>> > > > > > > > >> > > > > > > I worry about us over-indexing on this one
>>> build,
>>> > > > > which
>>> > > > > > > > itself
>>> > > > > > > > >> > > > appears
>>> > > > > > > > >> > > > > > > to be an outlier. I only see one other build
>>> [1]
>>> > > > above
>>> > > > > > 6h
>>> > > > > > > > >> overall
>>> > > > > > > > >> > > in
>>> > > > > > > > >> > > > > > > the last 90 days in this view: [2]
>>> > > > > > > > >> > > > > > > And I don't see any overlap of failed tests
>>> in
>>> > > these
>>> > > > > two
>>> > > > > > > > >> builds,
>>> > > > > > > > >> > > > which
>>> > > > > > > > >> > > > > > > makes it less likely that these particular
>>> failed
>>> > > > > tests
>>> > > > > > > are
>>> > > > > > > > >> the
>>> > > > > > > > >> > > > causes
>>> > > > > > > > >> > > > > > > of long build times.
>>> > > > > > > > >> > > > > > >
>>> > > > > > > > >> > > > > > > Separately, I've been investigating build
>>> > > > environment
>>> > > > > > > > >> slowness,
>>> > > > > > > > >> > and
>>> > > > > > > > >> > > > > > > trying to connect it with test failures [3].
>>> I
>>> > > > > observed
>>> > > > > > > that
>>> > > > > > > > >> the
>>> > > > > > > > >> > CI
>>> > > > > > > > >> > > > > > > build environment is 2-20 times slower than
>>> my
>>> > > > > developer
>>> > > > > > > > >> machine
>>> > > > > > > > >> > > (M1
>>> > > > > > > > >> > > > > > > mac).
>>> > > > > > > > >> > > > > > > When I simulate a similar slowdown locally,
>>> there
>>> > > > are
>>> > > > > > > tests
>>> > > > > > > > >> which
>>> > > > > > > > >> > > > > > > become significantly more flakey, often due
>>> to
>>> > > > > > hard-coded
>>> > > > > > > > >> > timeouts.
>>> > > > > > > > >> > > > > > > I think that these particularly nasty builds
>>> could
>>> > > > be
>>> > > > > > > > >> explained
>>> > > > > > > > >> > by
>>> > > > > > > > >> > > > > > > long-tail slowdowns causing arbitrary tests
>>> to
>>> > > take
>>> > > > an
>>> > > > > > > > >> excessive
>>> > > > > > > > >> > > time
>>> > > > > > > > >> > > > > > > to execute.
>>> > > > > > > > >> > > > > > >
>>> > > > > > > > >> > > > > > > Rather than trying to find signals in these
>>> rare
>>> > > > test
>>> > > > > > > > >> failures, I
>>> > > > > > > > >> > > > > > > think we should find tests that have these
>>> sorts
>>> > > of
>>> > > > > > > failures
>>> > > > > > > > >> more
>>> > > > > > > > >> > > > > > > regularly.
>>> > > > > > > > >> > > > > > > There are lots of builds in the 5-6h duration
>>> > > > bracket,
>>> > > > > > > which
>>> > > > > > > > >> is
>>> > > > > > > > >> > > > > > > certainly unacceptably long. We should look
>>> into
>>> > > > these
>>> > > > > > > > builds
>>> > > > > > > > >> to
>>> > > > > > > > >> > > find
>>> > > > > > > > >> > > > > > > improvements and optimizations.
>>> > > > > > > > >> > > > > > >
>>> > > > > > > > >> > > > > > > [1] https://ge.apache.org/s/ygh4gbz4uma6i/
>>> > > > > > > > >> > > > > > > [2]
>>> > > > > > > > >> > > > > > >
>>> > > > > > > > >> > > > > >
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > >
>>> > > > > > > > >> >
>>> > > > > > > > >>
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> https://ge.apache.org/scans?list.sortColumn=buildDuration&search.relativeStartTime=P90D&search.rootProjectNames=kafka&search.tags=trunk&search.timeZoneId=America%2FNew_York
>>> > > > > > > > >> > > > > > > [3]
>>> https://github.com/apache/kafka/pull/15008
>>> > > > > > > > >> > > > > > >
>>> > > > > > > > >> > > > > > > Thanks for looking into this!
>>> > > > > > > > >> > > > > > > Greg
>>> > > > > > > > >> > > > > > >
>>> > > > > > > > >> > > > > > > On Tue, Dec 19, 2023 at 3:45 PM Николай
>>> Ижиков <
>>> > > > > > > > >> > > nizhi...@apache.org>
>>> > > > > > > > >> > > > > > > wrote:
>>> > > > > > > > >> > > > > > > >
>>> > > > > > > > >> > > > > > > > Hello, Stanislav.
>>> > > > > > > > >> > > > > > > >
>>> > > > > > > > >> > > > > > > > Can you, please, make the gist public.
>>> > > > > > > > >> > > > > > > > Private gists not available for some GitHub
>>> > > users
>>> > > > > even
>>> > > > > > > if
>>> > > > > > > > >> link
>>> > > > > > > > >> > > are
>>> > > > > > > > >> > > > > > known.
>>> > > > > > > > >> > > > > > > >
>>> > > > > > > > >> > > > > > > > > 19 дек. 2023 г., в 17:33, Stanislav
>>> Kozlovski
>>> > > <
>>> > > > > > > > >> > > > > > stanis...@confluent.io.INVALID>
>>> > > > > > > > >> > > > > > > написал(а):
>>> > > > > > > > >> > > > > > > > >
>>> > > > > > > > >> > > > > > > > > Hey everybody,
>>> > > > > > > > >> > > > > > > > > I've heard various complaints that build
>>> times
>>> > > > in
>>> > > > > > > trunk
>>> > > > > > > > >> are
>>> > > > > > > > >> > > > taking
>>> > > > > > > > >> > > > > > too
>>> > > > > > > > >> > > > > > > > > long, some taking as much as 8 hours (the
>>> > > > > timeout) -
>>> > > > > > > and
>>> > > > > > > > >> this
>>> > > > > > > > >> > > is
>>> > > > > > > > >> > > > > > > slowing us
>>> > > > > > > > >> > > > > > > > > down from being able to meet the code
>>> freeze
>>> > > > > > deadline
>>> > > > > > > > for
>>> > > > > > > > >> > 3.7.
>>> > > > > > > > >> > > > > > > > >
>>> > > > > > > > >> > > > > > > > > I took it upon myself to gather up some
>>> data
>>> > > in
>>> > > > > > Gradle
>>> > > > > > > > >> > > Enterprise
>>> > > > > > > > >> > > > > to
>>> > > > > > > > >> > > > > > > see if
>>> > > > > > > > >> > > > > > > > > there are any outlier tests that are
>>> causing
>>> > > > this
>>> > > > > > > > >> slowness.
>>> > > > > > > > >> > > Turns
>>> > > > > > > > >> > > > > out
>>> > > > > > > > >> > > > > > > there
>>> > > > > > > > >> > > > > > > > > are a few, in this particular build -
>>> > > > > > > > >> > > > > > > https://ge.apache.org/s/un2hv7n6j374k/
>>> > > > > > > > >> > > > > > > > > - which took 10 hours and 29 minutes in
>>> total.
>>> > > > > > > > >> > > > > > > > >
>>> > > > > > > > >> > > > > > > > > I have compiled the tests that took a
>>> > > > > > > disproportionately
>>> > > > > > > > >> > large
>>> > > > > > > > >> > > > > amount
>>> > > > > > > > >> > > > > > > of
>>> > > > > > > > >> > > > > > > > > time (20m+), alongside their time, error
>>> > > message
>>> > > > > > and a
>>> > > > > > > > >> link
>>> > > > > > > > >> > to
>>> > > > > > > > >> > > > > their
>>> > > > > > > > >> > > > > > > full
>>> > > > > > > > >> > > > > > > > > log output here -
>>> > > > > > > > >> > > > > > > > >
>>> > > > > > > > >> > > > > > >
>>> > > > > > > > >> > > > > >
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > >
>>> > > > > > > > >> >
>>> > > > > > > > >>
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> https://gist.github.com/stanislavkozlovski/8959f7ee59434f774841f4ae2f5228c2
>>> > > > > > > > >> > > > > > > > >
>>> > > > > > > > >> > > > > > > > > It includes failures from core, streams,
>>> > > storage
>>> > > > > and
>>> > > > > > > > >> clients.
>>> > > > > > > > >> > > > > > > > > Interestingly, some other tests that
>>> don't
>>> > > fail
>>> > > > > also
>>> > > > > > > > take
>>> > > > > > > > >> a
>>> > > > > > > > >> > > long
>>> > > > > > > > >> > > > > time
>>> > > > > > > > >> > > > > > > in
>>> > > > > > > > >> > > > > > > > > what is apparently the test harness
>>> framework.
>>> > > > See
>>> > > > > > the
>>> > > > > > > > >> gist
>>> > > > > > > > >> > for
>>> > > > > > > > >> > > > > more
>>> > > > > > > > >> > > > > > > > > information.
>>> > > > > > > > >> > > > > > > > >
>>> > > > > > > > >> > > > > > > > > I am starting this thread with the
>>> intention
>>> > > of
>>> > > > > > > getting
>>> > > > > > > > >> the
>>> > > > > > > > >> > > > > > discussion
>>> > > > > > > > >> > > > > > > > > started and brainstorming what we can do
>>> to
>>> > > get
>>> > > > > the
>>> > > > > > > > build
>>> > > > > > > > >> > times
>>> > > > > > > > >> > > > > back
>>> > > > > > > > >> > > > > > > under
>>> > > > > > > > >> > > > > > > > > control.
>>> > > > > > > > >> > > > > > > > >
>>> > > > > > > > >> > > > > > > > >
>>> > > > > > > > >> > > > > > > > > --
>>> > > > > > > > >> > > > > > > > > Best,
>>> > > > > > > > >> > > > > > > > > Stanislav
>>> > > > > > > > >> > > > > > > >
>>> > > > > > > > >> > > > > > >
>>> > > > > > > > >> > > > > >
>>> > > > > > > > >> > > > > >
>>> > > > > > > > >> > > > > > --
>>> > > > > > > > >> > > > > > Best,
>>> > > > > > > > >> > > > > > Stanislav
>>> > > > > > > > >> > > > > >
>>> > > > > > > > >> > > > >
>>> > > > > > > > >> > > >
>>> > > > > > > > >> > >
>>> > > > > > > > >> >
>>> > > > > > > > >>
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > > --
>>> > > > > > Best,
>>> > > > > > Stanislav
>>> > > > > >
>>> > > > >
>>> > > >
>>> > > >
>>> > > > --
>>> > > > -David
>>> > > >
>>> > >
>>>
>>>

Reply via email to