Gaurav, The target-determinator is certainly the "off-the-shelf" solution I expected would be out there. If the project migrates to Bazel I think that would make the partial builds much easier to implement. I think we should look into the other benefits of migrating to bazel to see if it is worth it even if the partial builds feature is decided against, or after it is reverted.
Chris, > Do you think we should aim to disable > merges without a full suite of passing CI runs (allowing for administrative > override in an emergency)? If so, what would the path be from our current > state to there? What can we do to ensure that we don't get stuck relying on > a once-temporary aid that becomes effectively permanent? Yes I think it would be nice to require a green build to merge, without excessive merge-queue infrastructure that is more common in high-volume monorepos. 1. We would decide on a sunset date on the partial build indicator, at which point it will be disabled on trunk and all release branches. I suspect that a sunset date could be set for 1-3 releases in the future. 2. We would enable partial builds. The merge-button would remain as-is, with the partial build and full-suite indicators both being visible. 3. Communication would be sent out to explain that the partial build indicators are a temporary flakiness reduction tool. 4. Committers would continue to rely on the full-suite indicators, and watch the partial build to get a sense of the level of flakiness in each module. 5. On new contributions which have failing partial builds, Committers can ask that contributors investigate and follow-up on flaky failures that appeared in their PR, explaining that they can help make that indicator green for future contributions. 6. After the sunset date passes, the partial build indicators will be disabled. 7. If the main trunk build is sufficiently reliable by some criteria (e.g. 20 passes in a row) we can discuss requiring a green build for the GitHub merge button. > We probably > want to build awareness of this dependency graph into any partial CI logic > we add, but if we do opt for that, then this change would > disproportionately benefit downstream modules (Streams, Connect, MM2), and > have little to no benefit for upstream ones (clients and at least some core > modules). I considered that, and I think it is more beneficial to provide equitable partial builds than ones with perfect coverage. If you want to know about cross-module failures, I think that the full suite is still the appropriate tool to detect those. >From another perspective, if this change is only useful to `core` after all other dependent modules have addressed their flakiness, then it delays `core` contributors from the reward for addressing their flakiness. And if anything, making the partial build cover fewer failure modes and reminding people of that fact will keep the full-suite builds relevant. > but people should already be making > sure that tests are running locally before they push changes I agree, and I don't think that partial builds are in any danger of replacing local testing for fast synchronous iteration. I also agree that it is appropriate to give personal feedback to repeat contributors to improve the quality of their PRs. CI seems to be for enforcing the lowest-common-denominator on contributions, and benefiting (potentially first-time) contributors which do not have the familiarity, mindfulness, or resources to pre-test each of their contributions. It is in those situations that I think partial builds can be beneficial: if there is something mechanical that all contributors should be doing locally, why not have it done automatically on their behalf? > Finally, since there are a number of existing flaky tests on trunk, what > would the strategy be for handling those? Do we try to get to a green state > on a per-module basis (possibly with awareness of downstream modules) as > quickly as possible, and then selectively enable partial builds once we > feel confident that flakiness has been addressed? For modules which receive regular PRs, the partial builds would surface those flakes, incentivising fixing them. Once the per-module flakiness is under control, committers for those modules can start to require green partial builds, pushing back on PRs which have obvious failures. For modules which do not receive regular PRs, those flakes will appear in all of the full-suite builds, and wouldn't be addressed by the partial builds. Those would require either another incentive structure, or would need volunteers from other modules to help fix them. I'm not sure which modules receive the least PR traffic, but I can see that the current most stale modules are log4j-appender 1 year ago, streams:examples 6 months ago, connect:file 5 months ago, storage:api 5 months ago, and streams-scala 5 months ago. Thanks for the feedback all, Greg On Wed, Jun 7, 2023 at 7:55 AM Chris Egerton <chr...@aiven.io.invalid> wrote: > > Hi Greg, > > I can see the point about enabling partial runs as a temporary measure to > fight flakiness, and it does carry some merit. In that case, though, we > should have an idea of what the desired end state is once we've stopped > relying on any temporary measures. Do you think we should aim to disable > merges without a full suite of passing CI runs (allowing for administrative > override in an emergency)? If so, what would the path be from our current > state to there? What can we do to ensure that we don't get stuck relying on > a once-temporary aid that becomes effectively permanent? > > With partial builds, we also need to be careful to make sure to correctly > handle cross-module dependencies. A tweak to broker or client logic may > only affect files in one module and pass all tests for that module, but > have far-reaching consequences for Streams, Connect, and MM2. We probably > want to build awareness of this dependency graph into any partial CI logic > we add, but if we do opt for that, then this change would > disproportionately benefit downstream modules (Streams, Connect, MM2), and > have little to no benefit for upstream ones (clients and at least some core > modules). > > With regards to faster iteration times--I agree that it would be nice if > our CI builds didn't take 2-3 hours, but people should already be making > sure that tests are running locally before they push changes (or, if they > really want, they can run tests locally after pushing changes). And if > rapid iteration is necessary, it's always (or at least for the foreseeable > future) going to be faster to run whatever specific tests or build tasks > you need to run locally, instead of pushing to GitHub and waiting for > Jenkins to check for you. > > Finally, since there are a number of existing flaky tests on trunk, what > would the strategy be for handling those? Do we try to get to a green state > on a per-module basis (possibly with awareness of downstream modules) as > quickly as possible, and then selectively enable partial builds once we > feel confident that flakiness has been addressed? > > Cheers, > > Chris > > On Wed, Jun 7, 2023 at 5:09 AM Gaurav Narula <ka...@gnarula.com> wrote: > > > Hey Greg, > > > > Thanks for sharing this idea! > > > > The idea of building and testing a relevant subset of code certainly seems > > interesting. > > > > Perhaps this is a good fit for Bazel [1] where > > target-determinator [2] can be used to to find a subset of targets that > > have changed between two commits. > > > > Even without [2], Bazel builds can benefit immensely from distributing > > builds > > to a set of remote nodes [3] with support for caching previously built > > targets [4]. > > > > We've seen a few other ASF projects adopt Bazel as well: > > > > * https://github.com/apache/rocketmq > > * https://github.com/apache/brpc > > * https://github.com/apache/trafficserver > > * https://github.com/apache/ws-axiom > > > > I wonder how the Kafka community feels about experimenting with Bazel and > > exploring if it helps us offer faster build times without compromising on > > the > > correctness of the targets that need to be built and tested? > > > > Thanks, > > Gaurav > > > > [1]: https://bazel.build > > [2]: https://github.com/bazel-contrib/target-determinator > > [3]: https://bazel.build/remote/rbe > > [4]: https://bazel.build/remote/caching > > > > On 2023/06/05 17:47:07 Greg Harris wrote: > > > Hey all, > > > > > > I've been working on test flakiness recently, and I've been trying to > > > come up with ways to tackle the issue top-down as well as bottom-up, > > > and I'm interested to hear your thoughts on an idea. > > > > > > In addition to the current full-suite runs, can we in parallel trigger > > > a smaller test run which has only a relevant subset of tests? For > > > example, if someone is working on one sub-module, the CI would only > > > run tests in that module. > > > > > > I think this would be more likely to pass than the full suite due to > > > the fewer tests failing probabilistically, and would improve the > > > signal-to-noise ratio of the summary pass/fail marker on GitHub. This > > > should also be shorter to execute than the full suite, allowing for > > > faster cycle-time than the current full suite encourages. > > > > > > This would also strengthen the incentive for contributors specializing > > > in a module to de-flake tests, as they are rewarded with a tangible > > > improvement within their area of the project. Currently, even the > > > modules with the most reliable tests receive consistent CI failures > > > from other less reliable modules. > > > > > > I believe this is possible, even if there isn't an off-the-shelf > > > solution for it. We can learn of the changed files via a git diff, map > > > that to modules containing those files, and then execute the tests > > > just for those modules with gradle. GitHub also permits showing > > > multiple "checks" so that we can emit both the full-suite and partial > > > test results. > > > > > > Thanks, > > > Greg > > >