Also worth mentioning - Develocity does have support for caching and
build analysis with Bazel -
https://gradle.com/gradle-enterprise-solutions/bazel-build-system/
I don't know about Gradle's partnership with ASF, Are they requiring
Kafka to use Gradle to get the cache and analysis UI?

On Fri, Nov 22, 2024 at 11:57 AM Farid Zakaria
<fzaka...@confluent.io.invalid> wrote:
>
> On Fri, Nov 22, 2024 at 10:47 AM David Arthur <mum...@gmail.com> wrote:
> >
> > Thanks for the KIP, Vince and Farid!
> >
> > A few comments on the KIP
> >
> > DA1: The motivation mentioned historical build times in Apache Kafka, but
> > we should really just consider the current situation. I would like to see
> > the motivation/background make a clearer distinction between trunk builds
> > (no caching) and pull request builds (with caching).
> >
> > DA2: An 80% reduction (3hr -> 30m) in build times seems only possible with
> > aggressive caching and task avoidance. We have implemented the Gradle cache
> > for our PR builds which has only given us around a 20% reduction due to our
> > dependency graph [1] (from 2hr to 1hr40m). This is because a change in
> > ":clients" will result in a full rebuild of nearly everything since it is
> > highly depended on. How does Bazel deal with a situation like this? Would a
> > change in ":clients" only cause the ":clients" module to be rebuilt and
> > tested?
>
> Bazel generates an "interface JAR" post-process or actually can do it at 
> compile
> time using turbine [1].
> I've written a bit about it [2]  if you would like edification.
>
> By checking the ABI of the interface JAR, Bazel can cut-off building
> dependents if the API
> has not changed.
>
> I believe Gradle does something similar to this in the space [3] but
> I'm not sure how it works with
> remote caching.
>
> [1] https://github.com/google/turbine
> [2] 
> https://fzakaria.com/2024/10/29/bazel-knowledge-what-s-an-interface-jar.html
> [3] 
> https://blog.gradle.org/our-approach-to-faster-compilation#what-is-a-header-jar
>
> >
> > This is probably my biggest question about Bazel. How do we get faster
> > build times without sacrificing safety? ("No free lunch" principle)
> >
> > DA3: > Additionally, Gradle resolves a separate dependency tree per module,
> > meaning that final tar.gz and docker images could end up with multiple
> > versions of the same jar, which can have unexpected results.
> >
> > I'm not sure how much this happens in practice. Gradle has a feature known
> > as Platforms which helps manage transitive dependencies
> > https://docs.gradle.org/current/userguide/platforms.html
> >
> > DA4: > This is a heavyweight solution that adds additional network and
> > storage costs on the package registries, and requires moving code around to
> > many “*-common” modules
> >
> > I disagree strongly with this. Restructuring the code into "-api",
> > "-common", etc results in better organized code and a cleaner dependency
> > graph. We only publish to Maven during release time, and those artifacts
> > are rarely consumed (aside from the public API ones like
> > clients/streams/connect). The number of JARs produced isn't really a
> > problem IMO.
> >
> > I think restructuring the code (as we have been doing) would benefit a
> > Bazel build in a similar way that it has been benefiting the Gradle build.
> >
> > DA5:  > Bazel promotes fine-grained targets, allowing each Java package, or
> > even individual files within a package, to have distinct BUILD targets
> >
> > To me this is a bit of an anti-pattern. In my opinion, a build target
> > should be scoped to a module. This has been the standard way of building
> > JVM projects for a long time. Having fine grained file level build targets
> > harkens back to Apache Ant :)
>
> What is a module?....
> It can be a collection of java packages or it can be a single java package.
> Bazel lets you decide the fidelity and you can apply visibility rules so that
> dependencies don't take-on accidental usage.
>
> Not much more to say here; the way Bazel approaches targets is philosophically
> different than Gradle/Maven. It encourages small targets.
> The benefit of per-package targets is you catch cyclic dependencies
> between packages
> easily. Gradle/Maven you can maybe help yourself with `-api` modules
> but you are still
> free to form cycles within each module.
>
> >
> > DA6: For the past few months, we have been enjoying Develocity (formerly
> > Gradle Enterprise) as a way to gather historical build data, investigate
> > build/test failures, and see trends in our build performance and flaky
> > tests. It has been very valuable in the ongoing crusade against flaky
> > tests. Does Bazel have anything like this?
> >
>
> Checkout the BuildBuddy links I sent earlier.
> They have some notable Java projects (Kibana) they support.
>
> > DA7: Not really a comment on the KIP, but more of an FYI about Gradle -- we
> > have received code contributions and general support from Gradle Inc as
> > part of the Gradle + ASF partnership. In addition to the free Develocity
> > instance (ge.apache.org), we will soon have the ability to use the remote
> > cache for developers to use locally.
>
> That's great to hear.
>
> >
> > ---
> >
> > Some comments on the discussion so far:
> >
> > > CP0: Can Bazel help simplify build.gradle (currently around 3826 lines)?
> >
> > The BAZEL files in https://github.com/confluentinc/kafka-bazel total
> > around 6500 lines
> >
> >
> > > SVV3. Before switching to a completely different build system, I think we
> > have to ask ourselves: did we exhaust any optimization in Gradle that can
> > be done, and is it really that slower to build in it?
> >
> > As I mentioned above in DA2, we are now at a point where refactoring the
> > code is the primary path to improve our build times with Gradle.
> > Specifically, breaking up "core" and relocating those tests to appropriate
> > sub-modules will help a lot. As it stands, ":core:test" takes up over an
> > hour of our build times and is often required to be rebuilt in PRs due to
> > our module dependencies.
> >
> > We can see that builds without many code changes are already benefiting
> > from the cache. For example,
> >
> > Docs PR: https://github.com/apache/kafka/pull/17910
> > 34m build time: https://github.com/apache/kafka/actions/runs/11972266657
> >
> > ---
> >
> >
> > Thanks!
> > David A
> >
> > [1]
> > https://ge.apache.org/scans/performance?search.rootProjectNames=kafka&search.tags=github%2Cnot:trunk&search.tasks=test&search.timeZoneId=America%2FNew_York
> >
> >
> > On Fri, Nov 22, 2024 at 12:38 PM Vince Rose <vr...@confluent.io.invalid>
> > wrote:
> >
> > > Bringing fzaka...@confluent.io into the discussion.
> > >
> > > On Fri, Nov 22, 2024 at 10:01 AM Viktor Somogyi-Vass
> > > <viktor.somo...@cloudera.com.invalid> wrote:
> > >
> > > > Hi Vincent
> > > >
> > > > I would also like to say that I'm not an expert in Bazel either.
> > > Generally
> > > > I think it's good to reduce build and test times in every infra. While
> > > > reading up on Bazel, it seems like it's most useful in monolithic code
> > > > bases, such as Confluent as you mentioned in the KIP (so it's not a
> > > > coincidence :) ). I have a few thoughts though:
> > > >
> > > > SVV1. That 175 vs 100 seconds in the no-cache seems to be a big
> > > difference.
> > > > There is a performance benchmark by Gradle (
> > > > https://blog.gradle.org/gradle-vs-bazel-jvm) which might be biased from
> > > > their side, but seems to conclude different results. One thing they
> > > mention
> > > > in the article is that they had one BAZEL.build for a project, instead 
> > > > of
> > > > the package level granularity. In your provided example, you seem to 
> > > > have
> > > > it on a per module basis. Could this be the difference in performance?
> > > Have
> > > > you tried it out with different granularities for Bazel?
> > > >
> > > > SVV2. I can share the arguments in CP2 as well. Bazel would optimally
> > > need
> > > > a build file in every package, which you'll need to maintain manually.
> > > This
> > > > is an extra burden on the developer side. Based on the example you 
> > > > cited,
> > > > Gradle seems to have better JVM integration compared to Bazel as the
> > > latter
> > > > one tries to be a generic build tool and not tuned for the JVM. I think
> > > > that this could be a potential concern.
> > > >
> > > > SVV3. Before switching to a completely different build system, I think 
> > > > we
> > > > have to ask ourselves: did we exhaust any optimization in Gradle that 
> > > > can
> > > > be done, and is it really that slower to build in it? As the article
> > > above
> > > > mentioned, if Bazel build files are defined on package level, then the
> > > > package level cyclic dependencies can be eliminated. This, however, can
> > > be
> > > > done with Java's module system too, which would be the idiomatic Java
> > > way.
> > > > If this is done, perhaps it could speed up Gradle builds as well.
> > > >
> > > > Best,
> > > > Viktor
> > > >
> > > > On Fri, Nov 22, 2024 at 5:43 PM Gaurav Narula <ka...@gnarula.com> wrote:
> > > >
> > > > > Thanks for the KIP Vince and Farid!
> > > > >
> > > > > Glad to see Bazel being considered for Kafka and really impressive 
> > > > > work
> > > > on
> > > > > the demo repository!
> > > > >
> > > > > From the benchmarks, I'd imagine the build would only get faster once
> > > > > BUILD.bazel files don't glob the entire source/test set for a module
> > > but
> > > > > are more fine grained. I suppose it would be even faster if we were to
> > > > use
> > > > > RBE [0] for executing builds/tests in parallel with caching [1]. Would
> > > be
> > > > > nice to get a sense of the potential improvements there if you've
> > > already
> > > > > done some experiments and some possible RBE options that we may use?
> > > > Maybe
> > > > > another motivation/optimisation would be the use of 
> > > > > target-determinator
> > > > > [2], previously discussed in [3] for running partial tests for PRs?
> > > > >
> > > > > I see that you mention potential changes in IDE configurations. It
> > > would
> > > > > be nice to mention plugins for common IDEs that support Bazel. IIRC,
> > > > > IntelliJ has decent Bazel support [4] but (IMHO) it's not as good as
> > > the
> > > > > Gradle integration. Can you perhaps discuss the potential changes in
> > > > > developer workflows, e.g. the need to add dependencies/files to
> > > > BUILD.bazel
> > > > > files manually or running Gazelle to create BUILD.bazel files
> > > > automatically?
> > > > >
> > > > > I see that in the example repository you declare the versions manually
> > > > > [5]. I think it would be cumbersome to maintain two separate versions
> > > > list
> > > > > and perhaps we can generate the version mapping from
> > > > > `gradle/dependencies.gradle`, unless they need to deviate for some
> > > > reason?
> > > > >
> > > > > To add to some of @Chia's questions and perhaps answer them partially:
> > > > >
> > > > > > CP0: Can Bazel help simplify build.gradle (currently around 3826
> > > > lines)?
> > > > >
> > > > > I think some of the complexity moves to *.bzl, *.bazel files [6] [7]
> > > [8]
> > > > >
> > > > > The configuration is in a language called Starlark [9] which is a
> > > subset
> > > > > of Python that avoids non-determinism. I'd say the configuration is
> > > more
> > > > > verbose than Gradle but is easier to reason about. I often find Gradle
> > > > > plugins do a lot of magic and are harder to navigate.
> > > > >
> > > > > > CP1: Gradle commands are part of the public API, so in which Kafka
> > > > > version should we announce the deprecation of Gradle commands?
> > > > >
> > > > > My impression is once we get rid of Gradle so Phase 5?
> > > > >
> > > > > > CP2: The example (https://github.com/confluentinc/kafka-bazel) seems
> > > > to
> > > > > modify the folder structure of the entire Kafka project. Is this 
> > > > > change
> > > > > required? My concern is that such a major change should be done in a
> > > > major
> > > > > release. If restructuring is a requirement for Bazel, does that mean 
> > > > > we
> > > > > have to accept it after this KIP pass?
> > > > >
> > > > > I stumbled upon Farid's blog post [10] which talks a bit about this 
> > > > > and
> > > > my
> > > > > impression is the overlay can be merged into the main repository
> > > without
> > > > > changing its structure, potentially in a side directory as LLVM did
> > > > [11]. I
> > > > > suppose this is what Phase 2 is about?
> > > > >
> > > > > That said, I do think refining the build graph, which includes having
> > > > more
> > > > > fine grained BUILD.bazel files, may warrant moving some
> > > classes/packages
> > > > > around. This is because Bazel isn't happy with cycles in the build
> > > graph
> > > > > while Gradle allows them within a module.
> > > > >
> > > > > I'm looking forward to trying the overlay repository in the coming 
> > > > > days
> > > > > and will follow up with more questions/feedback!
> > > > >
> > > > > Regards,
> > > > > Gaurav
> > > > >
> > > > > [0]: https://bazel.build/remote/rbe
> > > > > [1]: https://bazel.build/remote/caching
> > > > > [2]: https://github.com/bazel-contrib/target-determinator
> > > > > [3]: https://lists.apache.org/thread/xqbj7ywmwf5zfycgo1ckyhl9zs43h1dd
> > > > > [4]: https://github.com/bazelbuild/intellij
> > > > > [5]:
> > > > >
> > > >
> > > https://github.com/confluentinc/kafka-bazel/blob/a165c512c5d43256564b53ee0be2fbfe7388d5f9/kafka-bazel/kafka-overlay/dependencies/dependencies.bzl#L24
> > > > > [6]:
> > > > >
> > > >
> > > https://github.com/confluentinc/kafka-bazel/tree/a165c512c5d43256564b53ee0be2fbfe7388d5f9/kafka-bazel/kafka-overlay/tools-bazel
> > > > > [7]:
> > > > >
> > > >
> > > https://github.com/confluentinc/kafka-bazel/tree/a165c512c5d43256564b53ee0be2fbfe7388d5f9/kafka-bazel/kafka-overlay/dependencies
> > > > > [8]:
> > > > >
> > > >
> > > https://github.com/confluentinc/kafka-bazel/blob/a165c512c5d43256564b53ee0be2fbfe7388d5f9/kafka-bazel/WORKSPACE.bazel
> > > > > [9]: https://github.com/bazelbuild/starlark
> > > > > [10]: https://fzakaria.com/2024/08/29/bazel-overlay-pattern.html
> > > > > [11]: https://github.com/llvm/llvm-project/tree/main/utils/bazel
> > > > >
> > > > > > On 22 Nov 2024, at 03:39, Chia-Ping Tsai <chia7...@apache.org>
> > > wrote:
> > > > > >
> > > > > > hi Vince
> > > > > >
> > > > > > Thanks for introducing this interesting KIP! I'm not an expert in
> > > > Bazel,
> > > > > so please be patient with my potentially odd questions:
> > > > > >
> > > > > > CP0: Can Bazel help simplify build.gradle (currently around 3826
> > > > lines)?
> > > > > >
> > > > > > CP1: Gradle commands are part of the public API, so in which Kafka
> > > > > version should we announce the deprecation of Gradle commands?
> > > > > >
> > > > > > CP2: The example (https://github.com/confluentinc/kafka-bazel) seems
> > > > to
> > > > > modify the folder structure of the entire Kafka project. Is this 
> > > > > change
> > > > > required? My concern is that such a major change should be done in a
> > > > major
> > > > > release. If restructuring is a requirement for Bazel, does that mean 
> > > > > we
> > > > > have to accept it after this KIP pass?
> > > > > >
> > > > > > Best,
> > > > > > Chia-Ping
> > > > > >
> > > > > > On 2024/11/21 21:16:50 Vince Rose wrote:
> > > > > >> Hey devs,
> > > > > >>
> > > > > >> I want to start a discussion about introducing Bazel as a build
> > > system
> > > > > for
> > > > > >> Apache Kafka.
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1115%3A+Bazel+Builds
> > > > > >>
> > > > > >> -Vince
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > David Arthur

Reply via email to