https://github.com/apache/pulsar/pull/25359
PoC PR: https://github.com/merlimat/pulsar/pull/16 --------------------------------------------------------------------------------------------- # PIP-463: Migrate Build System from Maven to Gradle # Background Knowledge Apache Pulsar currently uses Maven as its build system. The project has grown to over 100 modules with complex dependency relationships, shaded JARs, NAR packaging, and Docker image builds. Maven's sequential execution model and limited caching capabilities result in long build times that impact developer productivity and CI throughput. [Gradle](https://gradle.org/) is a modern build system used by large-scale Java projects (e.g., Spring Boot, Micronaut, Apache Kafka). It provides parallel task execution, fine-grained caching, and incremental compilation out of the box. # Motivation The current Maven build has several pain points that affect developer velocity and CI efficiency: **Slow local builds.** A full `mvn install -DskipTests` takes 5-8 minutes on a modern machine. Developers frequently wait for unrelated modules to rebuild when iterating on a single component. Maven has no built-in mechanism to skip unchanged modules — it rebuilds everything in the reactor. **Slow CI.** The CI pipeline takes 50-60 minutes end-to-end. Maven's lack of caching means each CI run starts from scratch. Test jobs must either rebuild everything or rely on fragile artifact-sharing workarounds. **Imprecise dependency tracking.** Maven treats the entire module as the unit of rebuild. Changing a test resource file triggers a full recompile of the module. There is no way to run "only the tests affected by my change" — developers must run the entire test suite for a module or manually specify test classes. **Limited parallelism.** Maven's `-T` flag enables module-level parallelism, but tasks within a module still run sequentially. The Pulsar build has several bottleneck modules (e.g., `pulsar-broker`) where compilation, resource processing, and test execution could overlap with other modules but don't. **Complex shading and packaging.** The project uses Maven Shade plugin, NAR plugin, and custom Ant tasks for packaging. These configurations are verbose, hard to maintain, and have subtle interactions (e.g., the `ahc-default.properties` content replacement for AsyncHttpClient requires an Ant `<replace>` task in Maven but is a single `filesMatching` call in Gradle). **Poor IDE integration for multi-module builds.** IntelliJ IDEA's Maven import for a project of Pulsar's size is slow and memory-intensive. Gradle's tooling API provides faster, more reliable IDE synchronization. # Goals ## In Scope - **1:1 functional equivalence with Maven.** The Gradle build produces identical artifacts: - Server distribution tarball (`apache-pulsar-X.Y.Z-bin.tar.gz`) with the same JARs - Shell distribution tarball - IO connectors distribution (NAR files) - Offloaders distribution (NAR files) - Docker images (`pulsar`, `pulsar-all`, `java-test-image`, `pulsar-test-latest-version`) - Shaded client JARs (`pulsar-client`, `pulsar-client-admin`, `pulsar-client-all`) verified to contain the same classes and relocations as Maven output - **All CI tests passing.** Unit tests, integration tests, system tests, shade tests (Java 17/21/24), and backward compatibility tests all pass on the Gradle build. - **Enforced dependency management.** A `pulsar-dependencies` platform module (Gradle's equivalent of Maven's `dependencyManagement`) ensures consistent dependency versions across all modules. - **Version catalog.** A single `gradle/libs.versions.toml` file defines all dependency coordinates and versions, replacing scattered version properties across 100+ POM files. - **CI workflow migration.** All GitHub Actions workflows converted from Maven to Gradle commands. ## Out of Scope - Changing the project's module structure or merging/splitting modules - Migrating to Kotlin DSL for production source code - Gradle-specific optimizations beyond what Maven provides (e.g., build cache server, remote caching) — these are future enhancements enabled by the migration - Removing the ability to build individual modules in isolation # High Level Design The migration introduces Gradle build scripts alongside (and eventually replacing) the existing Maven POM files. The approach is: 1. **Add Gradle build files** for all modules (`build.gradle.kts`, `settings.gradle.kts`, `gradle/libs.versions.toml`) 2. **Convert CI workflows** from Maven to Gradle commands 3. **Remove Maven files** (`pom.xml`, `mvnw`, `.mvn/`) The Gradle build is structured as: ``` settings.gradle.kts # Module includes and plugin repositories build.gradle.kts # Root build: common config, enforced platform gradle/libs.versions.toml # Version catalog (single source of truth for versions) pulsar-dependencies/ # Enforced platform module (replaces dependencyManagement) <module>/build.gradle.kts # Per-module build script ``` Key design decisions: - **Shadow plugin** for shaded JARs (replaces Maven Shade), with `filesMatching` for property file content relocation - **NAR plugin** (`io.github.merlimat.nar`) for connector packaging - **LightProto plugin** for protobuf/lightproto code generation - **Conditional project includes** for shade test modules (avoids implicit parent project conflicts) - **Enforced platform** (`pulsar-dependencies`) for version pinning across all modules # Detailed Design ## Design & Implementation Details ### Build Performance Improvements | Aspect | Maven | Gradle | |--------|-------|--------| | Incremental compilation | No | Yes — only recompiles changed files | | Task-level caching | No | Yes — skips tasks whose inputs haven't changed | | Parallel execution | Module-level only (`-T`) | Task-level (automatic dependency graph) | | Configuration caching | No | Yes — reuses build configuration across runs | | Local build cache | No | Yes — persists across builds | | Remote build cache | No | Yes — shared across CI and developers (future) | **Expected impact:** - Local incremental builds (after initial): **seconds** instead of minutes - CI with caching: **30-50% faster** (exact numbers depend on cache hit rates) - "Build only what I need to test": `./gradlew :pulsar-broker:test` builds only the broker and its dependencies, skipping unrelated modules entirely ### Develocity Integration Gradle provides native integration with [Develocity](https://gradle.com/develocity/) (formerly Gradle Enterprise), hosted by the ASF at `develocity.apache.org`. Every CI build automatically publishes a build scan that provides: - **Test execution details**: per-test timings, pass/fail status, output logs, and stack traces — all searchable and filterable without downloading CI artifacts - **Task execution timeline**: visual breakdown of what ran, what was cached, and what was up-to-date, making it easy to identify bottleneck tasks - **Dependency resolution**: full dependency tree with conflict resolution details - **Build comparison**: diff two builds to see what changed in task execution or outputs - **Failure analysis**: aggregated view of flaky tests across builds Example build scan from the PoC CI run: [https://develocity.apache.org/s/h6ckzn3nn4w2s](https://develocity.apache.org/s/h6ckzn3nn4w2s) This level of observability is not available with the Maven build today. ### Dependency Management Maven's `dependencyManagement` in the root POM is replaced by: 1. **Version catalog** (`gradle/libs.versions.toml`): Defines all dependency coordinates and version numbers in one file. Modules reference dependencies as `libs.netty.buffer` instead of hardcoded group:artifact:version strings. 2. **Enforced platform** (`pulsar-dependencies`): A `java-platform` module that creates version constraints from the catalog. Applied to all subprojects via `implementation(enforcedPlatform(project(":pulsar-dependencies")))`. This ensures transitive dependencies are pinned to the same versions Maven would resolve. ### Shaded JAR Configuration The Shadow plugin replaces Maven Shade. Key differences handled: - **AsyncHttpClient properties**: Maven uses Ant `<replace>` to fix property name prefixes in `ahc-default.properties`. Gradle uses `filesMatching { filter { } }`. - **Dependency include/exclude**: Shadow's `dependencies { include/exclude }` DSL replaces Maven Shade's `<includes>/<excludes>`. - **Relocation**: Shadow's `relocate()` is functionally identical to Maven Shade's. ### NAR Packaging A custom NAR Gradle plugin (`io.github.merlimat.nar`) handles connector packaging. Global exclusions for platform modules (provided by `java-instance.jar` at runtime) are configured in the root `build.gradle.kts`. ### Module-Specific Overrides Some modules require version overrides that differ from the enforced platform: - **`kinesis-kpl-shaded`**: Forces `protobuf-java:4.29.0` (KPL requires protobuf 4.x, while Pulsar uses 3.x). The protobuf is relocated so no runtime conflict. - **`jclouds-shaded`**: Forces Guice 7.0.0, `jakarta.annotation-api:3.0.0`, `jakarta.ws.rs-api:3.1.0`, `jakarta.inject-api:2.0.1` (jclouds 2.6.0 requires Jakarta EE 10+ APIs). All are bundled in the shadow JAR. ## Public-facing Changes ### Configuration No new broker/client configuration options. The build system change is transparent to users. ### CLI - `mvn` commands replaced by `./gradlew` commands in documentation and scripts - `src/set-project-version.sh` updated to modify `gradle/libs.versions.toml` ### Binary Artifacts Artifacts are functionally identical. Minor differences: - Some shaded JARs may have slightly different class counts due to Shadow vs Shade plugin differences in handling `package-info.class` files (no runtime impact) # Security Considerations No security implications. The build system change does not affect Pulsar's runtime security model, authentication, or authorization. The Gradle wrapper (`gradlew`) is committed to the repository with a checksum-verified distribution URL, following the same security model as the Maven wrapper. # General Notes The implementation PR demonstrates full CI green status across all test suites, confirming functional equivalence with the Maven build. # Links * Proof of Concept PR (CI fully green): https://github.com/merlimat/pulsar/pull/16 * Mailing List discussion thread: [link] * Mailing List voting thread: [link] -- Matteo Merli <[email protected]>
