+1

-Lari

On Tue, 24 Mar 2026 at 22:12, Matteo Merli <[email protected]> wrote:
>
> Thanks Dave,
>
> I've update the PIP pr to specify:
>
> ```
> The initial release of `pulsar-connectors` will use the same version as the
> next Pulsar
> release (whether that is 4.3 or 5.0), to make the transition clear. After
> that, the
> connectors repository follows its own independent release cadence.
> ```
>
>
> I'll move to a VOTE
>
> --
> Matteo Merli
> <[email protected]>
>
>
> On Sun, Mar 22, 2026 at 1:36 PM Dave Fisher <[email protected]> wrote:
>
> >
> >
> > > On Mar 22, 2026, at 1:17 PM, Matteo Merli <[email protected]>
> > wrote:
> > >
> > > Hi Dave,
> > >
> > > Good question. I don’t have any strong opinions here. I could see the
> > case
> > > for:
> > >
> > > 1. 1.0 to signal a fresh start for the pulsar connector component. Still
> > > signaling the “matureness”. Might get awkward for individual connector
> > > versioning though since we’re already at 4.x.
> >
> > A lower version number being a more recent release would become awkward
> > for much tooling. Dependabot could be confused ...
> >
> > > 2. 5.0 since this will be the first pulsar release without the
> > connectors.
> > > This would be clearer although still somewhat imply a relationship with
> > the
> > > core pulsar release, which we want to break.
> >
> > If versions of Pulsar prior to 5.0 will continue to include IO Connectors
> > then 5.0 would make sense. You can explain. It clearly and you can explain
> > on the LTS page a bright line on how to know where to find the connectors.
> >
> > > Any other suggestions?
> >
> > You could start releasing one or more IO connectors at a time rather than
> > as a group. They would still be in the same repository. Each connector
> > could have its own version and you could use a calendar version schema for
> > the project release version. This is a pattern the new ASF tooling will
> > support. It’s a pattern that Airflow uses for its providers. I am not
> > suggesting that Pulsar follow the Sling pattern of 100s of repositories
> > although that would be valid alternative to this PIP.
> >
> > Best,
> > Dave
> >
> > >
> > > Thanks,
> > > Matteo
> > >
> > > --
> > > Matteo Merli
> > > <[email protected]>
> > >
> > >
> > > On Sun, Mar 22, 2026 at 10:27 AM Dave Fisher <[email protected]> wrote:
> > >
> > >> It is about time to make this change.
> > >>
> > >> One question I have is that what version number will be used for the
> > >> Pulsar IO Connectors first release?
> > >>
> > >> Best,
> > >> Dave
> > >>
> > >>> On Mar 21, 2026, at 4:08 PM, Matteo Merli <[email protected]>
> > >> wrote:
> > >>>
> > >>> https://github.com/apache/pulsar/pull/25383
> > >>>
> > >>> # PIP-465: Split IO Connectors into Separate Repository
> > >>>
> > >>> # Background Knowledge
> > >>>
> > >>> Apache Pulsar ships ~30 IO connectors (Kafka, Kinesis, Cassandra,
> > >>> Elasticsearch, JDBC, Debezium,
> > >>> etc.) as part of its main repository. These connectors are packaged as
> > >>> NAR files and bundled into
> > >>> a `pulsar-all` Docker image alongside the core broker, client, and
> > >>> functions runtime.
> > >>>
> > >>> Each connector brings its own dependency tree — often large and
> > >>> conflicting with other connectors
> > >>> or with Pulsar's core dependencies. The connectors interact with
> > >>> Pulsar exclusively through the
> > >>> stable `pulsar-io-core` API, making them natural candidates for
> > >>> independent development and release.
> > >>>
> > >>> # Motivation
> > >>>
> > >>> The primary goal of this PIP is to **make development of Pulsar
> > >>> easier** by shrinking the core
> > >>> codebase. Removing ~30 connectors and their dependency trees from the
> > >>> main repository will
> > >>> massively improve compile time, test execution time, CI resource
> > >>> consumption, and CI stability.
> > >>>
> > >>> **Build and CI impact.** Compiling and packaging 30+ connector NARs
> > >>> adds significant time to
> > >>> every CI run and local build, even when a developer is only working on
> > >>> the broker or client.
> > >>> The connectors collectively bring hundreds of transitive dependencies
> > >>> into the build graph,
> > >>> which slows down dependency resolution, inflates vulnerability reports
> > >>> (OWASP checks must scan
> > >>> connector dependencies), and creates version conflicts that require
> > >>> careful management in the
> > >>> main repository's BOM. Removing them dramatically reduces the surface
> > >>> area of the build.
> > >>>
> > >>> **Release coupling.** Connectors are tied to the Pulsar release cycle.
> > >>> A bug fix in a single
> > >>> connector (e.g., updating the Elasticsearch client) requires waiting
> > >>> for the next Pulsar release.
> > >>> Conversely, a Pulsar patch release must rebuild all connectors even
> > >>> when none of them changed.
> > >>> The release cadence for connectors will be independent from Pulsar
> > >>> releases, similar to what
> > >>> we already do for client SDKs (Go, Python, Node.js).
> > >>>
> > >>> **Low integration risk.** The `pulsar-io-core` API that connectors
> > >>> depend on has been very
> > >>> stable for a long time. There have been no breaking changes to the
> > >>> connector API in years,
> > >>> so there is essentially no risk of integration pain from this split.
> > >>>
> > >>> **Docker image bloat.** The `pulsar-all` image bundles every connector
> > >>> NAR, weighing in at
> > >>> ~2.9 GB — a very large image that most deployments don't need. Users
> > >>> typically deploy only
> > >>> 1-2 connectors but pay the image pull cost for all of them. The main
> > >>> reason users chose
> > >>> `pulsar-all` over
> > >>> `pulsar` was to get the tiered-storage offloaders — this PIP addresses
> > >>> that by packaging the
> > >>> offloader NARs directly into the `pulsar` image. Users who need
> > >>> specific connectors can still
> > >>> build tailored images by adding just the connector NARs they need on
> > >>> top of `apachepulsar/pulsar`.
> > >>>
> > >>> **Independent velocity.** Connector maintainers should be able to
> > >>> release new connector versions
> > >>> against a stable Pulsar API without coordinating with the core release
> > >> train.
> > >>>
> > >>> # Goals
> > >>>
> > >>> ## In Scope
> > >>>
> > >>> - **Create `apache/pulsar-connectors` repository** containing all IO
> > >>> connector modules, with
> > >>> their own Gradle build, version catalog, and CI pipeline. The
> > >>> repository is forked from the
> > >>> main Pulsar repository to preserve full git history.
> > >>>
> > >>> - **Remove connector modules from the main Pulsar repository.** Retain
> > >> only:
> > >>> - `pulsar-io-core` (the connector API)
> > >>> - `pulsar-io-data-generator` (minimal connector used in integration
> > >> tests)
> > >>> - The functions runtime and worker that load connectors at runtime
> > >>>
> > >>> - **Remove the `pulsar-all` Docker image.** The image is too large and
> > >>> most users don't need
> > >>> all connectors in a single image. The `pulsar` image becomes the
> > >>> single official image.
> > >>> Tiered-storage offloader NARs — the main reason users chose
> > >>> `pulsar-all` — are included
> > >>> directly in the `pulsar` image.
> > >>>
> > >>> - **Independent connector releases.** The `pulsar-connectors`
> > >>> repository has its own versioning
> > >>> and release cadence, independent from Pulsar releases — similar to
> > >>> what we already do for
> > >>> client SDKs. It can release new connector versions against any
> > >>> compatible Pulsar release.
> > >>>
> > >>> - **Connector distribution packaging.** The connectors repository
> > >>> produces a single release
> > >>> containing all connector NARs, as a distribution tarball that users
> > >>> can deploy into an
> > >>> existing Pulsar installation.
> > >>>
> > >>> ## Out of Scope
> > >>>
> > >>> - Changing the connector API (`pulsar-io-core`)
> > >>> - Changing how the functions worker discovers and loads connector NARs
> > >>> - A connector marketplace or registry (future enhancement)
> > >>> - Splitting out tiered-storage offloaders into their own repository
> > >>>
> > >>> # High Level Design
> > >>>
> > >>> The split creates two repositories from what is currently one:
> > >>>
> > >>> ```
> > >>> apache/pulsar (main repo)
> > >>> ├── pulsar-io/core/          # Connector API (retained)
> > >>> ├── pulsar-io/data-generator/ # Test connector (retained)
> > >>> ├── pulsar-functions/        # Runtime + worker (retained)
> > >>> ├── docker/pulsar/           # Single Docker image
> > >>> └── (broker, client, etc.)
> > >>>
> > >>> apache/pulsar-connectors (new repo)
> > >>> ├── aerospike/
> > >>> ├── aws/
> > >>> ├── cassandra/
> > >>> ├── debezium/
> > >>> │   ├── core/
> > >>> │   ├── mysql/
> > >>> │   ├── postgres/
> > >>> │   └── ...
> > >>> ├── elastic-search/
> > >>> ├── jdbc/
> > >>> │   ├── core/
> > >>> │   ├── postgres/
> > >>> │   └── ...
> > >>> ├── kafka/
> > >>> ├── kafka-connect-adaptor/
> > >>> ├── kinesis/
> > >>> ├── rabbitmq/
> > >>> ├── ... (all other connectors)
> > >>> ├── distribution/io/         # Distribution packaging
> > >>> └── docs/                    # Connector docs generation
> > >>> ```
> > >>>
> > >>> The connectors repository consumes Pulsar artifacts (`pulsar-io-core`,
> > >>> `pulsar-client`, etc.)
> > >>> as external Maven dependencies, not as source dependencies. This
> > >>> ensures connectors build against
> > >>> the published API and don't accidentally depend on internals.
> > >>>
> > >>> # Detailed Design
> > >>>
> > >>> ## Repository Structure
> > >>>
> > >>> The new `pulsar-connectors` repository is forked from the main Pulsar
> > >>> repository to preserve
> > >>> git history, then trimmed to contain only connector-related modules.
> > >>> Connectors are promoted
> > >>> from nested `pulsar-io/<name>` paths to top-level `<name>/`
> > >>> directories for a flatter structure.
> > >>>
> > >>> ## Build Configuration
> > >>>
> > >>> The connectors repository has its own:
> > >>> - `settings.gradle.kts` with all connector modules
> > >>> - `gradle/libs.versions.toml` with connector-specific dependency
> > versions
> > >>> - `pulsar-dependencies/` platform module pinning Pulsar artifact
> > versions
> > >>> - `build.gradle.kts` root build with shared configuration
> > >>>
> > >>> Pulsar core artifacts are declared as dependencies with a configurable
> > >> version:
> > >>> ```kotlin
> > >>> implementation("org.apache.pulsar:pulsar-io-core:${pulsarVersion}")
> > >>> ```
> > >>>
> > >>> ## Versioning Strategy
> > >>>
> > >>> The connectors repository uses its own version scheme, independent of
> > >>> Pulsar's version.
> > >>> All connectors are released together as a single release (not
> > >>> individually), and each
> > >>> release specifies which Pulsar versions it is compatible with (e.g.,
> > >>> "connectors 1.0.0
> > >>> is compatible with Pulsar 4.x").
> > >>>
> > >>> ## Docker Image Changes
> > >>>
> > >>> The `pulsar-all` image is removed. It bundled all connector NARs
> > >>> alongside the broker,
> > >>> producing a very large image that most deployments didn't need. The
> > >>> main reason users chose
> > >>> `pulsar-all` over `pulsar` was to get the tiered-storage offloaders.
> > >>> With this change:
> > >>>
> > >>> - Tiered-storage offloader NARs move into the `pulsar` image,
> > >>> eliminating the primary reason
> > >>> for `pulsar-all` to exist
> > >>> - The `pulsar` Docker image becomes the single official image,
> > >>> containing the broker, functions
> > >>> runtime, and tiered-storage offloader NARs
> > >>> - Users who need specific connectors can build tailored images by
> > >>> adding just the connector
> > >>> NARs they need on top of `apachepulsar/pulsar`, or mount them via
> > >>> volume mounts
> > >>>
> > >>> ## CI and Testing
> > >>>
> > >>> - The main Pulsar repository's CI no longer builds or tests connectors
> > >>> - The connectors repository has its own CI that builds and tests all
> > >> connectors
> > >>> - Integration tests that exercise specific connectors (e.g., Cassandra
> > >>> sink, Kafka source)
> > >>> move to the connectors repository
> > >>> - The main repository retains integration tests using `data-generator`
> > >>> for testing the
> > >>> connector loading and runtime machinery
> > >>>
> > >>> ## Migration for Users
> > >>>
> > >>> Users who currently use `pulsar-all` Docker image:
> > >>> 1. Switch to the `pulsar` Docker image
> > >>> 2. Download needed connector NARs from the connectors release
> > >>> 3. Mount NARs into the container (e.g., via volume mount to
> > >>> `/pulsar/connectors/`)
> > >>>
> > >>> Users who build from source:
> > >>> 1. Build the main Pulsar repository as before (faster, since
> > >>> connectors are gone)
> > >>> 2. Build the connectors repository separately if needed
> > >>>
> > >>> ## Public-facing Changes
> > >>>
> > >>> ### Docker Images
> > >>>
> > >>> | Before | After |
> > >>> |--------|-------|
> > >>> | `pulsar` — core only | `pulsar` — core + tiered-storage offloaders |
> > >>> | `pulsar-all` — core + all connectors + offloaders | *(removed)* |
> > >>>
> > >>> ### Artifacts
> > >>>
> > >>> - All connector NARs move from the main Pulsar release to a single
> > >>> unified release from
> > >>> the `pulsar-connectors` repository
> > >>> - All other Pulsar artifacts remain unchanged
> > >>>
> > >>> ### Configuration
> > >>>
> > >>> No changes to broker, client, or functions worker configuration.
> > >>>
> > >>> # Backward & Forward Compatibility
> > >>>
> > >>> ## Backward Compatibility
> > >>>
> > >>> The connector API (`pulsar-io-core`) does not change. Existing
> > >>> connector NARs continue
> > >>> to work with the functions worker without modification.
> > >>>
> > >>> The `pulsar-io-core` API has been very stable for years with no
> > >>> breaking changes, so connectors
> > >>> built against older API versions will continue to work with newer
> > >>> Pulsar releases and vice versa.
> > >>>
> > >>> ## Forward Compatibility
> > >>>
> > >>> New connector releases can target older Pulsar versions, as long as
> > >>> the `pulsar-io-core`
> > >>> API they depend on is compatible. Given the long track record of API
> > >>> stability, this is
> > >>> expected to work seamlessly across Pulsar 4.x releases.
> > >>>
> > >>> # Security Considerations
> > >>>
> > >>> No security implications. Connectors continue to be loaded through the
> > >>> same NAR classloader
> > >>> isolation mechanism. The split does not change the security model.
> > >>>
> > >>> Separating connector dependencies from the main repository actually
> > >>> improves security posture
> > >>> by reducing the attack surface of the core Pulsar build and making
> > >>> connector dependency
> > >>> updates independently releasable.
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Matteo Merli
> > >>> <[email protected]>
> > >>
> > >>
> >
> >

Reply via email to