+1

On Sun, 22 Mar 2026 at 07:09, Matteo Merli <[email protected]> wrote:

> https://github.com/apache/pulsar/pull/25383
>
> # PIP-465: Split IO Connectors into Separate Repository
>
> # Background Knowledge
>
> Apache Pulsar ships ~30 IO connectors (Kafka, Kinesis, Cassandra,
> Elasticsearch, JDBC, Debezium,
> etc.) as part of its main repository. These connectors are packaged as
> NAR files and bundled into
> a `pulsar-all` Docker image alongside the core broker, client, and
> functions runtime.
>
> Each connector brings its own dependency tree — often large and
> conflicting with other connectors
> or with Pulsar's core dependencies. The connectors interact with
> Pulsar exclusively through the
> stable `pulsar-io-core` API, making them natural candidates for
> independent development and release.
>
> # Motivation
>
> The primary goal of this PIP is to **make development of Pulsar
> easier** by shrinking the core
> codebase. Removing ~30 connectors and their dependency trees from the
> main repository will
> massively improve compile time, test execution time, CI resource
> consumption, and CI stability.
>
> **Build and CI impact.** Compiling and packaging 30+ connector NARs
> adds significant time to
> every CI run and local build, even when a developer is only working on
> the broker or client.
> The connectors collectively bring hundreds of transitive dependencies
> into the build graph,
> which slows down dependency resolution, inflates vulnerability reports
> (OWASP checks must scan
> connector dependencies), and creates version conflicts that require
> careful management in the
> main repository's BOM. Removing them dramatically reduces the surface
> area of the build.
>
> **Release coupling.** Connectors are tied to the Pulsar release cycle.
> A bug fix in a single
> connector (e.g., updating the Elasticsearch client) requires waiting
> for the next Pulsar release.
> Conversely, a Pulsar patch release must rebuild all connectors even
> when none of them changed.
> The release cadence for connectors will be independent from Pulsar
> releases, similar to what
> we already do for client SDKs (Go, Python, Node.js).
>
> **Low integration risk.** The `pulsar-io-core` API that connectors
> depend on has been very
> stable for a long time. There have been no breaking changes to the
> connector API in years,
> so there is essentially no risk of integration pain from this split.
>
> **Docker image bloat.** The `pulsar-all` image bundles every connector
> NAR, weighing in at
> ~2.9 GB — a very large image that most deployments don't need. Users
> typically deploy only
> 1-2 connectors but pay the image pull cost for all of them. The main
> reason users chose
> `pulsar-all` over
> `pulsar` was to get the tiered-storage offloaders — this PIP addresses
> that by packaging the
> offloader NARs directly into the `pulsar` image. Users who need
> specific connectors can still
> build tailored images by adding just the connector NARs they need on
> top of `apachepulsar/pulsar`.
>
> **Independent velocity.** Connector maintainers should be able to
> release new connector versions
> against a stable Pulsar API without coordinating with the core release
> train.
>
> # Goals
>
> ## In Scope
>
> - **Create `apache/pulsar-connectors` repository** containing all IO
> connector modules, with
>   their own Gradle build, version catalog, and CI pipeline. The
> repository is forked from the
>   main Pulsar repository to preserve full git history.
>
> - **Remove connector modules from the main Pulsar repository.** Retain
> only:
>   - `pulsar-io-core` (the connector API)
>   - `pulsar-io-data-generator` (minimal connector used in integration
> tests)
>   - The functions runtime and worker that load connectors at runtime
>
> - **Remove the `pulsar-all` Docker image.** The image is too large and
> most users don't need
>   all connectors in a single image. The `pulsar` image becomes the
> single official image.
>   Tiered-storage offloader NARs — the main reason users chose
> `pulsar-all` — are included
>   directly in the `pulsar` image.
>
> - **Independent connector releases.** The `pulsar-connectors`
> repository has its own versioning
>   and release cadence, independent from Pulsar releases — similar to
> what we already do for
>   client SDKs. It can release new connector versions against any
> compatible Pulsar release.
>
> - **Connector distribution packaging.** The connectors repository
> produces a single release
>   containing all connector NARs, as a distribution tarball that users
> can deploy into an
>   existing Pulsar installation.
>
> ## Out of Scope
>
> - Changing the connector API (`pulsar-io-core`)
> - Changing how the functions worker discovers and loads connector NARs
> - A connector marketplace or registry (future enhancement)
> - Splitting out tiered-storage offloaders into their own repository
>
> # High Level Design
>
> The split creates two repositories from what is currently one:
>
> ```
> apache/pulsar (main repo)
> ├── pulsar-io/core/          # Connector API (retained)
> ├── pulsar-io/data-generator/ # Test connector (retained)
> ├── pulsar-functions/        # Runtime + worker (retained)
> ├── docker/pulsar/           # Single Docker image
> └── (broker, client, etc.)
>
> apache/pulsar-connectors (new repo)
> ├── aerospike/
> ├── aws/
> ├── cassandra/
> ├── debezium/
> │   ├── core/
> │   ├── mysql/
> │   ├── postgres/
> │   └── ...
> ├── elastic-search/
> ├── jdbc/
> │   ├── core/
> │   ├── postgres/
> │   └── ...
> ├── kafka/
> ├── kafka-connect-adaptor/
> ├── kinesis/
> ├── rabbitmq/
> ├── ... (all other connectors)
> ├── distribution/io/         # Distribution packaging
> └── docs/                    # Connector docs generation
> ```
>
> The connectors repository consumes Pulsar artifacts (`pulsar-io-core`,
> `pulsar-client`, etc.)
> as external Maven dependencies, not as source dependencies. This
> ensures connectors build against
> the published API and don't accidentally depend on internals.
>
> # Detailed Design
>
> ## Repository Structure
>
> The new `pulsar-connectors` repository is forked from the main Pulsar
> repository to preserve
> git history, then trimmed to contain only connector-related modules.
> Connectors are promoted
> from nested `pulsar-io/<name>` paths to top-level `<name>/`
> directories for a flatter structure.
>
> ## Build Configuration
>
> The connectors repository has its own:
> - `settings.gradle.kts` with all connector modules
> - `gradle/libs.versions.toml` with connector-specific dependency versions
> - `pulsar-dependencies/` platform module pinning Pulsar artifact versions
> - `build.gradle.kts` root build with shared configuration
>
> Pulsar core artifacts are declared as dependencies with a configurable
> version:
> ```kotlin
> implementation("org.apache.pulsar:pulsar-io-core:${pulsarVersion}")
> ```
>
> ## Versioning Strategy
>
> The connectors repository uses its own version scheme, independent of
> Pulsar's version.
> All connectors are released together as a single release (not
> individually), and each
> release specifies which Pulsar versions it is compatible with (e.g.,
> "connectors 1.0.0
> is compatible with Pulsar 4.x").
>
> ## Docker Image Changes
>
> The `pulsar-all` image is removed. It bundled all connector NARs
> alongside the broker,
> producing a very large image that most deployments didn't need. The
> main reason users chose
> `pulsar-all` over `pulsar` was to get the tiered-storage offloaders.
> With this change:
>
> - Tiered-storage offloader NARs move into the `pulsar` image,
> eliminating the primary reason
>   for `pulsar-all` to exist
> - The `pulsar` Docker image becomes the single official image,
> containing the broker, functions
>   runtime, and tiered-storage offloader NARs
> - Users who need specific connectors can build tailored images by
> adding just the connector
>   NARs they need on top of `apachepulsar/pulsar`, or mount them via
> volume mounts
>
> ## CI and Testing
>
> - The main Pulsar repository's CI no longer builds or tests connectors
> - The connectors repository has its own CI that builds and tests all
> connectors
> - Integration tests that exercise specific connectors (e.g., Cassandra
> sink, Kafka source)
>   move to the connectors repository
> - The main repository retains integration tests using `data-generator`
> for testing the
>   connector loading and runtime machinery
>
> ## Migration for Users
>
> Users who currently use `pulsar-all` Docker image:
> 1. Switch to the `pulsar` Docker image
> 2. Download needed connector NARs from the connectors release
> 3. Mount NARs into the container (e.g., via volume mount to
> `/pulsar/connectors/`)
>
> Users who build from source:
> 1. Build the main Pulsar repository as before (faster, since
> connectors are gone)
> 2. Build the connectors repository separately if needed
>
> ## Public-facing Changes
>
> ### Docker Images
>
> | Before | After |
> |--------|-------|
> | `pulsar` — core only | `pulsar` — core + tiered-storage offloaders |
> | `pulsar-all` — core + all connectors + offloaders | *(removed)* |
>
> ### Artifacts
>
> - All connector NARs move from the main Pulsar release to a single
> unified release from
>   the `pulsar-connectors` repository
> - All other Pulsar artifacts remain unchanged
>
> ### Configuration
>
> No changes to broker, client, or functions worker configuration.
>
> # Backward & Forward Compatibility
>
> ## Backward Compatibility
>
> The connector API (`pulsar-io-core`) does not change. Existing
> connector NARs continue
> to work with the functions worker without modification.
>
> The `pulsar-io-core` API has been very stable for years with no
> breaking changes, so connectors
> built against older API versions will continue to work with newer
> Pulsar releases and vice versa.
>
> ## Forward Compatibility
>
> New connector releases can target older Pulsar versions, as long as
> the `pulsar-io-core`
> API they depend on is compatible. Given the long track record of API
> stability, this is
> expected to work seamlessly across Pulsar 4.x releases.
>
> # Security Considerations
>
> No security implications. Connectors continue to be loaded through the
> same NAR classloader
> isolation mechanism. The split does not change the security model.
>
> Separating connector dependencies from the main repository actually
> improves security posture
> by reducing the attack surface of the core Pulsar build and making
> connector dependency
> updates independently releasable.
>
>
>
> --
> Matteo Merli
> <[email protected]>
>

Reply via email to