https://github.com/apache/pulsar/pull/25383
# PIP-465: Split IO Connectors into Separate Repository
# Background Knowledge
Apache Pulsar ships ~30 IO connectors (Kafka, Kinesis, Cassandra,
Elasticsearch, JDBC, Debezium,
etc.) as part of its main repository. These connectors are packaged as
NAR files and bundled into
a `pulsar-all` Docker image alongside the core broker, client, and
functions runtime.
Each connector brings its own dependency tree — often large and
conflicting with other connectors
or with Pulsar's core dependencies. The connectors interact with
Pulsar exclusively through the
stable `pulsar-io-core` API, making them natural candidates for
independent development and release.
# Motivation
The primary goal of this PIP is to **make development of Pulsar
easier** by shrinking the core
codebase. Removing ~30 connectors and their dependency trees from the
main repository will
massively improve compile time, test execution time, CI resource
consumption, and CI stability.
**Build and CI impact.** Compiling and packaging 30+ connector NARs
adds significant time to
every CI run and local build, even when a developer is only working on
the broker or client.
The connectors collectively bring hundreds of transitive dependencies
into the build graph,
which slows down dependency resolution, inflates vulnerability reports
(OWASP checks must scan
connector dependencies), and creates version conflicts that require
careful management in the
main repository's BOM. Removing them dramatically reduces the surface
area of the build.
**Release coupling.** Connectors are tied to the Pulsar release cycle.
A bug fix in a single
connector (e.g., updating the Elasticsearch client) requires waiting
for the next Pulsar release.
Conversely, a Pulsar patch release must rebuild all connectors even
when none of them changed.
The release cadence for connectors will be independent from Pulsar
releases, similar to what
we already do for client SDKs (Go, Python, Node.js).
**Low integration risk.** The `pulsar-io-core` API that connectors
depend on has been very
stable for a long time. There have been no breaking changes to the
connector API in years,
so there is essentially no risk of integration pain from this split.
**Docker image bloat.** The `pulsar-all` image bundles every connector
NAR, weighing in at
~2.9 GB — a very large image that most deployments don't need. Users
typically deploy only
1-2 connectors but pay the image pull cost for all of them. The main
reason users chose
`pulsar-all` over
`pulsar` was to get the tiered-storage offloaders — this PIP addresses
that by packaging the
offloader NARs directly into the `pulsar` image. Users who need
specific connectors can still
build tailored images by adding just the connector NARs they need on
top of `apachepulsar/pulsar`.
**Independent velocity.** Connector maintainers should be able to
release new connector versions
against a stable Pulsar API without coordinating with the core release train.
# Goals
## In Scope
- **Create `apache/pulsar-connectors` repository** containing all IO
connector modules, with
their own Gradle build, version catalog, and CI pipeline. The
repository is forked from the
main Pulsar repository to preserve full git history.
- **Remove connector modules from the main Pulsar repository.** Retain only:
- `pulsar-io-core` (the connector API)
- `pulsar-io-data-generator` (minimal connector used in integration tests)
- The functions runtime and worker that load connectors at runtime
- **Remove the `pulsar-all` Docker image.** The image is too large and
most users don't need
all connectors in a single image. The `pulsar` image becomes the
single official image.
Tiered-storage offloader NARs — the main reason users chose
`pulsar-all` — are included
directly in the `pulsar` image.
- **Independent connector releases.** The `pulsar-connectors`
repository has its own versioning
and release cadence, independent from Pulsar releases — similar to
what we already do for
client SDKs. It can release new connector versions against any
compatible Pulsar release.
- **Connector distribution packaging.** The connectors repository
produces a single release
containing all connector NARs, as a distribution tarball that users
can deploy into an
existing Pulsar installation.
## Out of Scope
- Changing the connector API (`pulsar-io-core`)
- Changing how the functions worker discovers and loads connector NARs
- A connector marketplace or registry (future enhancement)
- Splitting out tiered-storage offloaders into their own repository
# High Level Design
The split creates two repositories from what is currently one:
```
apache/pulsar (main repo)
├── pulsar-io/core/ # Connector API (retained)
├── pulsar-io/data-generator/ # Test connector (retained)
├── pulsar-functions/ # Runtime + worker (retained)
├── docker/pulsar/ # Single Docker image
└── (broker, client, etc.)
apache/pulsar-connectors (new repo)
├── aerospike/
├── aws/
├── cassandra/
├── debezium/
│ ├── core/
│ ├── mysql/
│ ├── postgres/
│ └── ...
├── elastic-search/
├── jdbc/
│ ├── core/
│ ├── postgres/
│ └── ...
├── kafka/
├── kafka-connect-adaptor/
├── kinesis/
├── rabbitmq/
├── ... (all other connectors)
├── distribution/io/ # Distribution packaging
└── docs/ # Connector docs generation
```
The connectors repository consumes Pulsar artifacts (`pulsar-io-core`,
`pulsar-client`, etc.)
as external Maven dependencies, not as source dependencies. This
ensures connectors build against
the published API and don't accidentally depend on internals.
# Detailed Design
## Repository Structure
The new `pulsar-connectors` repository is forked from the main Pulsar
repository to preserve
git history, then trimmed to contain only connector-related modules.
Connectors are promoted
from nested `pulsar-io/<name>` paths to top-level `<name>/`
directories for a flatter structure.
## Build Configuration
The connectors repository has its own:
- `settings.gradle.kts` with all connector modules
- `gradle/libs.versions.toml` with connector-specific dependency versions
- `pulsar-dependencies/` platform module pinning Pulsar artifact versions
- `build.gradle.kts` root build with shared configuration
Pulsar core artifacts are declared as dependencies with a configurable version:
```kotlin
implementation("org.apache.pulsar:pulsar-io-core:${pulsarVersion}")
```
## Versioning Strategy
The connectors repository uses its own version scheme, independent of
Pulsar's version.
All connectors are released together as a single release (not
individually), and each
release specifies which Pulsar versions it is compatible with (e.g.,
"connectors 1.0.0
is compatible with Pulsar 4.x").
## Docker Image Changes
The `pulsar-all` image is removed. It bundled all connector NARs
alongside the broker,
producing a very large image that most deployments didn't need. The
main reason users chose
`pulsar-all` over `pulsar` was to get the tiered-storage offloaders.
With this change:
- Tiered-storage offloader NARs move into the `pulsar` image,
eliminating the primary reason
for `pulsar-all` to exist
- The `pulsar` Docker image becomes the single official image,
containing the broker, functions
runtime, and tiered-storage offloader NARs
- Users who need specific connectors can build tailored images by
adding just the connector
NARs they need on top of `apachepulsar/pulsar`, or mount them via
volume mounts
## CI and Testing
- The main Pulsar repository's CI no longer builds or tests connectors
- The connectors repository has its own CI that builds and tests all connectors
- Integration tests that exercise specific connectors (e.g., Cassandra
sink, Kafka source)
move to the connectors repository
- The main repository retains integration tests using `data-generator`
for testing the
connector loading and runtime machinery
## Migration for Users
Users who currently use `pulsar-all` Docker image:
1. Switch to the `pulsar` Docker image
2. Download needed connector NARs from the connectors release
3. Mount NARs into the container (e.g., via volume mount to
`/pulsar/connectors/`)
Users who build from source:
1. Build the main Pulsar repository as before (faster, since
connectors are gone)
2. Build the connectors repository separately if needed
## Public-facing Changes
### Docker Images
| Before | After |
|--------|-------|
| `pulsar` — core only | `pulsar` — core + tiered-storage offloaders |
| `pulsar-all` — core + all connectors + offloaders | *(removed)* |
### Artifacts
- All connector NARs move from the main Pulsar release to a single
unified release from
the `pulsar-connectors` repository
- All other Pulsar artifacts remain unchanged
### Configuration
No changes to broker, client, or functions worker configuration.
# Backward & Forward Compatibility
## Backward Compatibility
The connector API (`pulsar-io-core`) does not change. Existing
connector NARs continue
to work with the functions worker without modification.
The `pulsar-io-core` API has been very stable for years with no
breaking changes, so connectors
built against older API versions will continue to work with newer
Pulsar releases and vice versa.
## Forward Compatibility
New connector releases can target older Pulsar versions, as long as
the `pulsar-io-core`
API they depend on is compatible. Given the long track record of API
stability, this is
expected to work seamlessly across Pulsar 4.x releases.
# Security Considerations
No security implications. Connectors continue to be loaded through the
same NAR classloader
isolation mechanism. The split does not change the security model.
Separating connector dependencies from the main repository actually
improves security posture
by reducing the attack surface of the core Pulsar build and making
connector dependency
updates independently releasable.
--
Matteo Merli
<[email protected]>