> On Mar 22, 2026, at 1:17 PM, Matteo Merli <[email protected]> wrote:
> 
> Hi Dave,
> 
> Good question. I don’t have any strong opinions here. I could see the case
> for:
> 
> 1. 1.0 to signal a fresh start for the pulsar connector component. Still
> signaling the “matureness”. Might get awkward for individual connector
> versioning though since we’re already at 4.x.

A lower version number being a more recent release would become awkward for 
much tooling. Dependabot could be confused ...

> 2. 5.0 since this will be the first pulsar release without the connectors.
> This would be clearer although still somewhat imply a relationship with the
> core pulsar release, which we want to break.

If versions of Pulsar prior to 5.0 will continue to include IO Connectors then 
5.0 would make sense. You can explain. It clearly and you can explain on the 
LTS page a bright line on how to know where to find the connectors.

> Any other suggestions?

You could start releasing one or more IO connectors at a time rather than as a 
group. They would still be in the same repository. Each connector could have 
its own version and you could use a calendar version schema for the project 
release version. This is a pattern the new ASF tooling will support. It’s a 
pattern that Airflow uses for its providers. I am not suggesting that Pulsar 
follow the Sling pattern of 100s of repositories although that would be valid 
alternative to this PIP.

Best,
Dave

> 
> Thanks,
> Matteo
> 
> --
> Matteo Merli
> <[email protected]>
> 
> 
> On Sun, Mar 22, 2026 at 10:27 AM Dave Fisher <[email protected]> wrote:
> 
>> It is about time to make this change.
>> 
>> One question I have is that what version number will be used for the
>> Pulsar IO Connectors first release?
>> 
>> Best,
>> Dave
>> 
>>> On Mar 21, 2026, at 4:08 PM, Matteo Merli <[email protected]>
>> wrote:
>>> 
>>> https://github.com/apache/pulsar/pull/25383
>>> 
>>> # PIP-465: Split IO Connectors into Separate Repository
>>> 
>>> # Background Knowledge
>>> 
>>> Apache Pulsar ships ~30 IO connectors (Kafka, Kinesis, Cassandra,
>>> Elasticsearch, JDBC, Debezium,
>>> etc.) as part of its main repository. These connectors are packaged as
>>> NAR files and bundled into
>>> a `pulsar-all` Docker image alongside the core broker, client, and
>>> functions runtime.
>>> 
>>> Each connector brings its own dependency tree — often large and
>>> conflicting with other connectors
>>> or with Pulsar's core dependencies. The connectors interact with
>>> Pulsar exclusively through the
>>> stable `pulsar-io-core` API, making them natural candidates for
>>> independent development and release.
>>> 
>>> # Motivation
>>> 
>>> The primary goal of this PIP is to **make development of Pulsar
>>> easier** by shrinking the core
>>> codebase. Removing ~30 connectors and their dependency trees from the
>>> main repository will
>>> massively improve compile time, test execution time, CI resource
>>> consumption, and CI stability.
>>> 
>>> **Build and CI impact.** Compiling and packaging 30+ connector NARs
>>> adds significant time to
>>> every CI run and local build, even when a developer is only working on
>>> the broker or client.
>>> The connectors collectively bring hundreds of transitive dependencies
>>> into the build graph,
>>> which slows down dependency resolution, inflates vulnerability reports
>>> (OWASP checks must scan
>>> connector dependencies), and creates version conflicts that require
>>> careful management in the
>>> main repository's BOM. Removing them dramatically reduces the surface
>>> area of the build.
>>> 
>>> **Release coupling.** Connectors are tied to the Pulsar release cycle.
>>> A bug fix in a single
>>> connector (e.g., updating the Elasticsearch client) requires waiting
>>> for the next Pulsar release.
>>> Conversely, a Pulsar patch release must rebuild all connectors even
>>> when none of them changed.
>>> The release cadence for connectors will be independent from Pulsar
>>> releases, similar to what
>>> we already do for client SDKs (Go, Python, Node.js).
>>> 
>>> **Low integration risk.** The `pulsar-io-core` API that connectors
>>> depend on has been very
>>> stable for a long time. There have been no breaking changes to the
>>> connector API in years,
>>> so there is essentially no risk of integration pain from this split.
>>> 
>>> **Docker image bloat.** The `pulsar-all` image bundles every connector
>>> NAR, weighing in at
>>> ~2.9 GB — a very large image that most deployments don't need. Users
>>> typically deploy only
>>> 1-2 connectors but pay the image pull cost for all of them. The main
>>> reason users chose
>>> `pulsar-all` over
>>> `pulsar` was to get the tiered-storage offloaders — this PIP addresses
>>> that by packaging the
>>> offloader NARs directly into the `pulsar` image. Users who need
>>> specific connectors can still
>>> build tailored images by adding just the connector NARs they need on
>>> top of `apachepulsar/pulsar`.
>>> 
>>> **Independent velocity.** Connector maintainers should be able to
>>> release new connector versions
>>> against a stable Pulsar API without coordinating with the core release
>> train.
>>> 
>>> # Goals
>>> 
>>> ## In Scope
>>> 
>>> - **Create `apache/pulsar-connectors` repository** containing all IO
>>> connector modules, with
>>> their own Gradle build, version catalog, and CI pipeline. The
>>> repository is forked from the
>>> main Pulsar repository to preserve full git history.
>>> 
>>> - **Remove connector modules from the main Pulsar repository.** Retain
>> only:
>>> - `pulsar-io-core` (the connector API)
>>> - `pulsar-io-data-generator` (minimal connector used in integration
>> tests)
>>> - The functions runtime and worker that load connectors at runtime
>>> 
>>> - **Remove the `pulsar-all` Docker image.** The image is too large and
>>> most users don't need
>>> all connectors in a single image. The `pulsar` image becomes the
>>> single official image.
>>> Tiered-storage offloader NARs — the main reason users chose
>>> `pulsar-all` — are included
>>> directly in the `pulsar` image.
>>> 
>>> - **Independent connector releases.** The `pulsar-connectors`
>>> repository has its own versioning
>>> and release cadence, independent from Pulsar releases — similar to
>>> what we already do for
>>> client SDKs. It can release new connector versions against any
>>> compatible Pulsar release.
>>> 
>>> - **Connector distribution packaging.** The connectors repository
>>> produces a single release
>>> containing all connector NARs, as a distribution tarball that users
>>> can deploy into an
>>> existing Pulsar installation.
>>> 
>>> ## Out of Scope
>>> 
>>> - Changing the connector API (`pulsar-io-core`)
>>> - Changing how the functions worker discovers and loads connector NARs
>>> - A connector marketplace or registry (future enhancement)
>>> - Splitting out tiered-storage offloaders into their own repository
>>> 
>>> # High Level Design
>>> 
>>> The split creates two repositories from what is currently one:
>>> 
>>> ```
>>> apache/pulsar (main repo)
>>> ├── pulsar-io/core/          # Connector API (retained)
>>> ├── pulsar-io/data-generator/ # Test connector (retained)
>>> ├── pulsar-functions/        # Runtime + worker (retained)
>>> ├── docker/pulsar/           # Single Docker image
>>> └── (broker, client, etc.)
>>> 
>>> apache/pulsar-connectors (new repo)
>>> ├── aerospike/
>>> ├── aws/
>>> ├── cassandra/
>>> ├── debezium/
>>> │   ├── core/
>>> │   ├── mysql/
>>> │   ├── postgres/
>>> │   └── ...
>>> ├── elastic-search/
>>> ├── jdbc/
>>> │   ├── core/
>>> │   ├── postgres/
>>> │   └── ...
>>> ├── kafka/
>>> ├── kafka-connect-adaptor/
>>> ├── kinesis/
>>> ├── rabbitmq/
>>> ├── ... (all other connectors)
>>> ├── distribution/io/         # Distribution packaging
>>> └── docs/                    # Connector docs generation
>>> ```
>>> 
>>> The connectors repository consumes Pulsar artifacts (`pulsar-io-core`,
>>> `pulsar-client`, etc.)
>>> as external Maven dependencies, not as source dependencies. This
>>> ensures connectors build against
>>> the published API and don't accidentally depend on internals.
>>> 
>>> # Detailed Design
>>> 
>>> ## Repository Structure
>>> 
>>> The new `pulsar-connectors` repository is forked from the main Pulsar
>>> repository to preserve
>>> git history, then trimmed to contain only connector-related modules.
>>> Connectors are promoted
>>> from nested `pulsar-io/<name>` paths to top-level `<name>/`
>>> directories for a flatter structure.
>>> 
>>> ## Build Configuration
>>> 
>>> The connectors repository has its own:
>>> - `settings.gradle.kts` with all connector modules
>>> - `gradle/libs.versions.toml` with connector-specific dependency versions
>>> - `pulsar-dependencies/` platform module pinning Pulsar artifact versions
>>> - `build.gradle.kts` root build with shared configuration
>>> 
>>> Pulsar core artifacts are declared as dependencies with a configurable
>> version:
>>> ```kotlin
>>> implementation("org.apache.pulsar:pulsar-io-core:${pulsarVersion}")
>>> ```
>>> 
>>> ## Versioning Strategy
>>> 
>>> The connectors repository uses its own version scheme, independent of
>>> Pulsar's version.
>>> All connectors are released together as a single release (not
>>> individually), and each
>>> release specifies which Pulsar versions it is compatible with (e.g.,
>>> "connectors 1.0.0
>>> is compatible with Pulsar 4.x").
>>> 
>>> ## Docker Image Changes
>>> 
>>> The `pulsar-all` image is removed. It bundled all connector NARs
>>> alongside the broker,
>>> producing a very large image that most deployments didn't need. The
>>> main reason users chose
>>> `pulsar-all` over `pulsar` was to get the tiered-storage offloaders.
>>> With this change:
>>> 
>>> - Tiered-storage offloader NARs move into the `pulsar` image,
>>> eliminating the primary reason
>>> for `pulsar-all` to exist
>>> - The `pulsar` Docker image becomes the single official image,
>>> containing the broker, functions
>>> runtime, and tiered-storage offloader NARs
>>> - Users who need specific connectors can build tailored images by
>>> adding just the connector
>>> NARs they need on top of `apachepulsar/pulsar`, or mount them via
>>> volume mounts
>>> 
>>> ## CI and Testing
>>> 
>>> - The main Pulsar repository's CI no longer builds or tests connectors
>>> - The connectors repository has its own CI that builds and tests all
>> connectors
>>> - Integration tests that exercise specific connectors (e.g., Cassandra
>>> sink, Kafka source)
>>> move to the connectors repository
>>> - The main repository retains integration tests using `data-generator`
>>> for testing the
>>> connector loading and runtime machinery
>>> 
>>> ## Migration for Users
>>> 
>>> Users who currently use `pulsar-all` Docker image:
>>> 1. Switch to the `pulsar` Docker image
>>> 2. Download needed connector NARs from the connectors release
>>> 3. Mount NARs into the container (e.g., via volume mount to
>>> `/pulsar/connectors/`)
>>> 
>>> Users who build from source:
>>> 1. Build the main Pulsar repository as before (faster, since
>>> connectors are gone)
>>> 2. Build the connectors repository separately if needed
>>> 
>>> ## Public-facing Changes
>>> 
>>> ### Docker Images
>>> 
>>> | Before | After |
>>> |--------|-------|
>>> | `pulsar` — core only | `pulsar` — core + tiered-storage offloaders |
>>> | `pulsar-all` — core + all connectors + offloaders | *(removed)* |
>>> 
>>> ### Artifacts
>>> 
>>> - All connector NARs move from the main Pulsar release to a single
>>> unified release from
>>> the `pulsar-connectors` repository
>>> - All other Pulsar artifacts remain unchanged
>>> 
>>> ### Configuration
>>> 
>>> No changes to broker, client, or functions worker configuration.
>>> 
>>> # Backward & Forward Compatibility
>>> 
>>> ## Backward Compatibility
>>> 
>>> The connector API (`pulsar-io-core`) does not change. Existing
>>> connector NARs continue
>>> to work with the functions worker without modification.
>>> 
>>> The `pulsar-io-core` API has been very stable for years with no
>>> breaking changes, so connectors
>>> built against older API versions will continue to work with newer
>>> Pulsar releases and vice versa.
>>> 
>>> ## Forward Compatibility
>>> 
>>> New connector releases can target older Pulsar versions, as long as
>>> the `pulsar-io-core`
>>> API they depend on is compatible. Given the long track record of API
>>> stability, this is
>>> expected to work seamlessly across Pulsar 4.x releases.
>>> 
>>> # Security Considerations
>>> 
>>> No security implications. Connectors continue to be loaded through the
>>> same NAR classloader
>>> isolation mechanism. The split does not change the security model.
>>> 
>>> Separating connector dependencies from the main repository actually
>>> improves security posture
>>> by reducing the attack surface of the core Pulsar build and making
>>> connector dependency
>>> updates independently releasable.
>>> 
>>> 
>>> 
>>> --
>>> Matteo Merli
>>> <[email protected]>
>> 
>> 

Reply via email to