+1 On Sun, 22 Mar 2026 at 07:09, Matteo Merli <[email protected]> wrote:
> https://github.com/apache/pulsar/pull/25383 > > # PIP-465: Split IO Connectors into Separate Repository > > # Background Knowledge > > Apache Pulsar ships ~30 IO connectors (Kafka, Kinesis, Cassandra, > Elasticsearch, JDBC, Debezium, > etc.) as part of its main repository. These connectors are packaged as > NAR files and bundled into > a `pulsar-all` Docker image alongside the core broker, client, and > functions runtime. > > Each connector brings its own dependency tree — often large and > conflicting with other connectors > or with Pulsar's core dependencies. The connectors interact with > Pulsar exclusively through the > stable `pulsar-io-core` API, making them natural candidates for > independent development and release. > > # Motivation > > The primary goal of this PIP is to **make development of Pulsar > easier** by shrinking the core > codebase. Removing ~30 connectors and their dependency trees from the > main repository will > massively improve compile time, test execution time, CI resource > consumption, and CI stability. > > **Build and CI impact.** Compiling and packaging 30+ connector NARs > adds significant time to > every CI run and local build, even when a developer is only working on > the broker or client. > The connectors collectively bring hundreds of transitive dependencies > into the build graph, > which slows down dependency resolution, inflates vulnerability reports > (OWASP checks must scan > connector dependencies), and creates version conflicts that require > careful management in the > main repository's BOM. Removing them dramatically reduces the surface > area of the build. > > **Release coupling.** Connectors are tied to the Pulsar release cycle. > A bug fix in a single > connector (e.g., updating the Elasticsearch client) requires waiting > for the next Pulsar release. > Conversely, a Pulsar patch release must rebuild all connectors even > when none of them changed. > The release cadence for connectors will be independent from Pulsar > releases, similar to what > we already do for client SDKs (Go, Python, Node.js). > > **Low integration risk.** The `pulsar-io-core` API that connectors > depend on has been very > stable for a long time. There have been no breaking changes to the > connector API in years, > so there is essentially no risk of integration pain from this split. > > **Docker image bloat.** The `pulsar-all` image bundles every connector > NAR, weighing in at > ~2.9 GB — a very large image that most deployments don't need. Users > typically deploy only > 1-2 connectors but pay the image pull cost for all of them. The main > reason users chose > `pulsar-all` over > `pulsar` was to get the tiered-storage offloaders — this PIP addresses > that by packaging the > offloader NARs directly into the `pulsar` image. Users who need > specific connectors can still > build tailored images by adding just the connector NARs they need on > top of `apachepulsar/pulsar`. > > **Independent velocity.** Connector maintainers should be able to > release new connector versions > against a stable Pulsar API without coordinating with the core release > train. > > # Goals > > ## In Scope > > - **Create `apache/pulsar-connectors` repository** containing all IO > connector modules, with > their own Gradle build, version catalog, and CI pipeline. The > repository is forked from the > main Pulsar repository to preserve full git history. > > - **Remove connector modules from the main Pulsar repository.** Retain > only: > - `pulsar-io-core` (the connector API) > - `pulsar-io-data-generator` (minimal connector used in integration > tests) > - The functions runtime and worker that load connectors at runtime > > - **Remove the `pulsar-all` Docker image.** The image is too large and > most users don't need > all connectors in a single image. The `pulsar` image becomes the > single official image. > Tiered-storage offloader NARs — the main reason users chose > `pulsar-all` — are included > directly in the `pulsar` image. > > - **Independent connector releases.** The `pulsar-connectors` > repository has its own versioning > and release cadence, independent from Pulsar releases — similar to > what we already do for > client SDKs. It can release new connector versions against any > compatible Pulsar release. > > - **Connector distribution packaging.** The connectors repository > produces a single release > containing all connector NARs, as a distribution tarball that users > can deploy into an > existing Pulsar installation. > > ## Out of Scope > > - Changing the connector API (`pulsar-io-core`) > - Changing how the functions worker discovers and loads connector NARs > - A connector marketplace or registry (future enhancement) > - Splitting out tiered-storage offloaders into their own repository > > # High Level Design > > The split creates two repositories from what is currently one: > > ``` > apache/pulsar (main repo) > ├── pulsar-io/core/ # Connector API (retained) > ├── pulsar-io/data-generator/ # Test connector (retained) > ├── pulsar-functions/ # Runtime + worker (retained) > ├── docker/pulsar/ # Single Docker image > └── (broker, client, etc.) > > apache/pulsar-connectors (new repo) > ├── aerospike/ > ├── aws/ > ├── cassandra/ > ├── debezium/ > │ ├── core/ > │ ├── mysql/ > │ ├── postgres/ > │ └── ... > ├── elastic-search/ > ├── jdbc/ > │ ├── core/ > │ ├── postgres/ > │ └── ... > ├── kafka/ > ├── kafka-connect-adaptor/ > ├── kinesis/ > ├── rabbitmq/ > ├── ... (all other connectors) > ├── distribution/io/ # Distribution packaging > └── docs/ # Connector docs generation > ``` > > The connectors repository consumes Pulsar artifacts (`pulsar-io-core`, > `pulsar-client`, etc.) > as external Maven dependencies, not as source dependencies. This > ensures connectors build against > the published API and don't accidentally depend on internals. > > # Detailed Design > > ## Repository Structure > > The new `pulsar-connectors` repository is forked from the main Pulsar > repository to preserve > git history, then trimmed to contain only connector-related modules. > Connectors are promoted > from nested `pulsar-io/<name>` paths to top-level `<name>/` > directories for a flatter structure. > > ## Build Configuration > > The connectors repository has its own: > - `settings.gradle.kts` with all connector modules > - `gradle/libs.versions.toml` with connector-specific dependency versions > - `pulsar-dependencies/` platform module pinning Pulsar artifact versions > - `build.gradle.kts` root build with shared configuration > > Pulsar core artifacts are declared as dependencies with a configurable > version: > ```kotlin > implementation("org.apache.pulsar:pulsar-io-core:${pulsarVersion}") > ``` > > ## Versioning Strategy > > The connectors repository uses its own version scheme, independent of > Pulsar's version. > All connectors are released together as a single release (not > individually), and each > release specifies which Pulsar versions it is compatible with (e.g., > "connectors 1.0.0 > is compatible with Pulsar 4.x"). > > ## Docker Image Changes > > The `pulsar-all` image is removed. It bundled all connector NARs > alongside the broker, > producing a very large image that most deployments didn't need. The > main reason users chose > `pulsar-all` over `pulsar` was to get the tiered-storage offloaders. > With this change: > > - Tiered-storage offloader NARs move into the `pulsar` image, > eliminating the primary reason > for `pulsar-all` to exist > - The `pulsar` Docker image becomes the single official image, > containing the broker, functions > runtime, and tiered-storage offloader NARs > - Users who need specific connectors can build tailored images by > adding just the connector > NARs they need on top of `apachepulsar/pulsar`, or mount them via > volume mounts > > ## CI and Testing > > - The main Pulsar repository's CI no longer builds or tests connectors > - The connectors repository has its own CI that builds and tests all > connectors > - Integration tests that exercise specific connectors (e.g., Cassandra > sink, Kafka source) > move to the connectors repository > - The main repository retains integration tests using `data-generator` > for testing the > connector loading and runtime machinery > > ## Migration for Users > > Users who currently use `pulsar-all` Docker image: > 1. Switch to the `pulsar` Docker image > 2. Download needed connector NARs from the connectors release > 3. Mount NARs into the container (e.g., via volume mount to > `/pulsar/connectors/`) > > Users who build from source: > 1. Build the main Pulsar repository as before (faster, since > connectors are gone) > 2. Build the connectors repository separately if needed > > ## Public-facing Changes > > ### Docker Images > > | Before | After | > |--------|-------| > | `pulsar` — core only | `pulsar` — core + tiered-storage offloaders | > | `pulsar-all` — core + all connectors + offloaders | *(removed)* | > > ### Artifacts > > - All connector NARs move from the main Pulsar release to a single > unified release from > the `pulsar-connectors` repository > - All other Pulsar artifacts remain unchanged > > ### Configuration > > No changes to broker, client, or functions worker configuration. > > # Backward & Forward Compatibility > > ## Backward Compatibility > > The connector API (`pulsar-io-core`) does not change. Existing > connector NARs continue > to work with the functions worker without modification. > > The `pulsar-io-core` API has been very stable for years with no > breaking changes, so connectors > built against older API versions will continue to work with newer > Pulsar releases and vice versa. > > ## Forward Compatibility > > New connector releases can target older Pulsar versions, as long as > the `pulsar-io-core` > API they depend on is compatible. Given the long track record of API > stability, this is > expected to work seamlessly across Pulsar 4.x releases. > > # Security Considerations > > No security implications. Connectors continue to be loaded through the > same NAR classloader > isolation mechanism. The split does not change the security model. > > Separating connector dependencies from the main repository actually > improves security posture > by reducing the attack surface of the core Pulsar build and making > connector dependency > updates independently releasable. > > > > -- > Matteo Merli > <[email protected]> >
