+1 -Lari
On Tue, 24 Mar 2026 at 22:12, Matteo Merli <[email protected]> wrote: > > Thanks Dave, > > I've update the PIP pr to specify: > > ``` > The initial release of `pulsar-connectors` will use the same version as the > next Pulsar > release (whether that is 4.3 or 5.0), to make the transition clear. After > that, the > connectors repository follows its own independent release cadence. > ``` > > > I'll move to a VOTE > > -- > Matteo Merli > <[email protected]> > > > On Sun, Mar 22, 2026 at 1:36 PM Dave Fisher <[email protected]> wrote: > > > > > > > > On Mar 22, 2026, at 1:17 PM, Matteo Merli <[email protected]> > > wrote: > > > > > > Hi Dave, > > > > > > Good question. I don’t have any strong opinions here. I could see the > > case > > > for: > > > > > > 1. 1.0 to signal a fresh start for the pulsar connector component. Still > > > signaling the “matureness”. Might get awkward for individual connector > > > versioning though since we’re already at 4.x. > > > > A lower version number being a more recent release would become awkward > > for much tooling. Dependabot could be confused ... > > > > > 2. 5.0 since this will be the first pulsar release without the > > connectors. > > > This would be clearer although still somewhat imply a relationship with > > the > > > core pulsar release, which we want to break. > > > > If versions of Pulsar prior to 5.0 will continue to include IO Connectors > > then 5.0 would make sense. You can explain. It clearly and you can explain > > on the LTS page a bright line on how to know where to find the connectors. > > > > > Any other suggestions? > > > > You could start releasing one or more IO connectors at a time rather than > > as a group. They would still be in the same repository. Each connector > > could have its own version and you could use a calendar version schema for > > the project release version. This is a pattern the new ASF tooling will > > support. It’s a pattern that Airflow uses for its providers. I am not > > suggesting that Pulsar follow the Sling pattern of 100s of repositories > > although that would be valid alternative to this PIP. > > > > Best, > > Dave > > > > > > > > Thanks, > > > Matteo > > > > > > -- > > > Matteo Merli > > > <[email protected]> > > > > > > > > > On Sun, Mar 22, 2026 at 10:27 AM Dave Fisher <[email protected]> wrote: > > > > > >> It is about time to make this change. > > >> > > >> One question I have is that what version number will be used for the > > >> Pulsar IO Connectors first release? > > >> > > >> Best, > > >> Dave > > >> > > >>> On Mar 21, 2026, at 4:08 PM, Matteo Merli <[email protected]> > > >> wrote: > > >>> > > >>> https://github.com/apache/pulsar/pull/25383 > > >>> > > >>> # PIP-465: Split IO Connectors into Separate Repository > > >>> > > >>> # Background Knowledge > > >>> > > >>> Apache Pulsar ships ~30 IO connectors (Kafka, Kinesis, Cassandra, > > >>> Elasticsearch, JDBC, Debezium, > > >>> etc.) as part of its main repository. These connectors are packaged as > > >>> NAR files and bundled into > > >>> a `pulsar-all` Docker image alongside the core broker, client, and > > >>> functions runtime. > > >>> > > >>> Each connector brings its own dependency tree — often large and > > >>> conflicting with other connectors > > >>> or with Pulsar's core dependencies. The connectors interact with > > >>> Pulsar exclusively through the > > >>> stable `pulsar-io-core` API, making them natural candidates for > > >>> independent development and release. > > >>> > > >>> # Motivation > > >>> > > >>> The primary goal of this PIP is to **make development of Pulsar > > >>> easier** by shrinking the core > > >>> codebase. Removing ~30 connectors and their dependency trees from the > > >>> main repository will > > >>> massively improve compile time, test execution time, CI resource > > >>> consumption, and CI stability. > > >>> > > >>> **Build and CI impact.** Compiling and packaging 30+ connector NARs > > >>> adds significant time to > > >>> every CI run and local build, even when a developer is only working on > > >>> the broker or client. > > >>> The connectors collectively bring hundreds of transitive dependencies > > >>> into the build graph, > > >>> which slows down dependency resolution, inflates vulnerability reports > > >>> (OWASP checks must scan > > >>> connector dependencies), and creates version conflicts that require > > >>> careful management in the > > >>> main repository's BOM. Removing them dramatically reduces the surface > > >>> area of the build. > > >>> > > >>> **Release coupling.** Connectors are tied to the Pulsar release cycle. > > >>> A bug fix in a single > > >>> connector (e.g., updating the Elasticsearch client) requires waiting > > >>> for the next Pulsar release. > > >>> Conversely, a Pulsar patch release must rebuild all connectors even > > >>> when none of them changed. > > >>> The release cadence for connectors will be independent from Pulsar > > >>> releases, similar to what > > >>> we already do for client SDKs (Go, Python, Node.js). > > >>> > > >>> **Low integration risk.** The `pulsar-io-core` API that connectors > > >>> depend on has been very > > >>> stable for a long time. There have been no breaking changes to the > > >>> connector API in years, > > >>> so there is essentially no risk of integration pain from this split. > > >>> > > >>> **Docker image bloat.** The `pulsar-all` image bundles every connector > > >>> NAR, weighing in at > > >>> ~2.9 GB — a very large image that most deployments don't need. Users > > >>> typically deploy only > > >>> 1-2 connectors but pay the image pull cost for all of them. The main > > >>> reason users chose > > >>> `pulsar-all` over > > >>> `pulsar` was to get the tiered-storage offloaders — this PIP addresses > > >>> that by packaging the > > >>> offloader NARs directly into the `pulsar` image. Users who need > > >>> specific connectors can still > > >>> build tailored images by adding just the connector NARs they need on > > >>> top of `apachepulsar/pulsar`. > > >>> > > >>> **Independent velocity.** Connector maintainers should be able to > > >>> release new connector versions > > >>> against a stable Pulsar API without coordinating with the core release > > >> train. > > >>> > > >>> # Goals > > >>> > > >>> ## In Scope > > >>> > > >>> - **Create `apache/pulsar-connectors` repository** containing all IO > > >>> connector modules, with > > >>> their own Gradle build, version catalog, and CI pipeline. The > > >>> repository is forked from the > > >>> main Pulsar repository to preserve full git history. > > >>> > > >>> - **Remove connector modules from the main Pulsar repository.** Retain > > >> only: > > >>> - `pulsar-io-core` (the connector API) > > >>> - `pulsar-io-data-generator` (minimal connector used in integration > > >> tests) > > >>> - The functions runtime and worker that load connectors at runtime > > >>> > > >>> - **Remove the `pulsar-all` Docker image.** The image is too large and > > >>> most users don't need > > >>> all connectors in a single image. The `pulsar` image becomes the > > >>> single official image. > > >>> Tiered-storage offloader NARs — the main reason users chose > > >>> `pulsar-all` — are included > > >>> directly in the `pulsar` image. > > >>> > > >>> - **Independent connector releases.** The `pulsar-connectors` > > >>> repository has its own versioning > > >>> and release cadence, independent from Pulsar releases — similar to > > >>> what we already do for > > >>> client SDKs. It can release new connector versions against any > > >>> compatible Pulsar release. > > >>> > > >>> - **Connector distribution packaging.** The connectors repository > > >>> produces a single release > > >>> containing all connector NARs, as a distribution tarball that users > > >>> can deploy into an > > >>> existing Pulsar installation. > > >>> > > >>> ## Out of Scope > > >>> > > >>> - Changing the connector API (`pulsar-io-core`) > > >>> - Changing how the functions worker discovers and loads connector NARs > > >>> - A connector marketplace or registry (future enhancement) > > >>> - Splitting out tiered-storage offloaders into their own repository > > >>> > > >>> # High Level Design > > >>> > > >>> The split creates two repositories from what is currently one: > > >>> > > >>> ``` > > >>> apache/pulsar (main repo) > > >>> ├── pulsar-io/core/ # Connector API (retained) > > >>> ├── pulsar-io/data-generator/ # Test connector (retained) > > >>> ├── pulsar-functions/ # Runtime + worker (retained) > > >>> ├── docker/pulsar/ # Single Docker image > > >>> └── (broker, client, etc.) > > >>> > > >>> apache/pulsar-connectors (new repo) > > >>> ├── aerospike/ > > >>> ├── aws/ > > >>> ├── cassandra/ > > >>> ├── debezium/ > > >>> │ ├── core/ > > >>> │ ├── mysql/ > > >>> │ ├── postgres/ > > >>> │ └── ... > > >>> ├── elastic-search/ > > >>> ├── jdbc/ > > >>> │ ├── core/ > > >>> │ ├── postgres/ > > >>> │ └── ... > > >>> ├── kafka/ > > >>> ├── kafka-connect-adaptor/ > > >>> ├── kinesis/ > > >>> ├── rabbitmq/ > > >>> ├── ... (all other connectors) > > >>> ├── distribution/io/ # Distribution packaging > > >>> └── docs/ # Connector docs generation > > >>> ``` > > >>> > > >>> The connectors repository consumes Pulsar artifacts (`pulsar-io-core`, > > >>> `pulsar-client`, etc.) > > >>> as external Maven dependencies, not as source dependencies. This > > >>> ensures connectors build against > > >>> the published API and don't accidentally depend on internals. > > >>> > > >>> # Detailed Design > > >>> > > >>> ## Repository Structure > > >>> > > >>> The new `pulsar-connectors` repository is forked from the main Pulsar > > >>> repository to preserve > > >>> git history, then trimmed to contain only connector-related modules. > > >>> Connectors are promoted > > >>> from nested `pulsar-io/<name>` paths to top-level `<name>/` > > >>> directories for a flatter structure. > > >>> > > >>> ## Build Configuration > > >>> > > >>> The connectors repository has its own: > > >>> - `settings.gradle.kts` with all connector modules > > >>> - `gradle/libs.versions.toml` with connector-specific dependency > > versions > > >>> - `pulsar-dependencies/` platform module pinning Pulsar artifact > > versions > > >>> - `build.gradle.kts` root build with shared configuration > > >>> > > >>> Pulsar core artifacts are declared as dependencies with a configurable > > >> version: > > >>> ```kotlin > > >>> implementation("org.apache.pulsar:pulsar-io-core:${pulsarVersion}") > > >>> ``` > > >>> > > >>> ## Versioning Strategy > > >>> > > >>> The connectors repository uses its own version scheme, independent of > > >>> Pulsar's version. > > >>> All connectors are released together as a single release (not > > >>> individually), and each > > >>> release specifies which Pulsar versions it is compatible with (e.g., > > >>> "connectors 1.0.0 > > >>> is compatible with Pulsar 4.x"). > > >>> > > >>> ## Docker Image Changes > > >>> > > >>> The `pulsar-all` image is removed. It bundled all connector NARs > > >>> alongside the broker, > > >>> producing a very large image that most deployments didn't need. The > > >>> main reason users chose > > >>> `pulsar-all` over `pulsar` was to get the tiered-storage offloaders. > > >>> With this change: > > >>> > > >>> - Tiered-storage offloader NARs move into the `pulsar` image, > > >>> eliminating the primary reason > > >>> for `pulsar-all` to exist > > >>> - The `pulsar` Docker image becomes the single official image, > > >>> containing the broker, functions > > >>> runtime, and tiered-storage offloader NARs > > >>> - Users who need specific connectors can build tailored images by > > >>> adding just the connector > > >>> NARs they need on top of `apachepulsar/pulsar`, or mount them via > > >>> volume mounts > > >>> > > >>> ## CI and Testing > > >>> > > >>> - The main Pulsar repository's CI no longer builds or tests connectors > > >>> - The connectors repository has its own CI that builds and tests all > > >> connectors > > >>> - Integration tests that exercise specific connectors (e.g., Cassandra > > >>> sink, Kafka source) > > >>> move to the connectors repository > > >>> - The main repository retains integration tests using `data-generator` > > >>> for testing the > > >>> connector loading and runtime machinery > > >>> > > >>> ## Migration for Users > > >>> > > >>> Users who currently use `pulsar-all` Docker image: > > >>> 1. Switch to the `pulsar` Docker image > > >>> 2. Download needed connector NARs from the connectors release > > >>> 3. Mount NARs into the container (e.g., via volume mount to > > >>> `/pulsar/connectors/`) > > >>> > > >>> Users who build from source: > > >>> 1. Build the main Pulsar repository as before (faster, since > > >>> connectors are gone) > > >>> 2. Build the connectors repository separately if needed > > >>> > > >>> ## Public-facing Changes > > >>> > > >>> ### Docker Images > > >>> > > >>> | Before | After | > > >>> |--------|-------| > > >>> | `pulsar` — core only | `pulsar` — core + tiered-storage offloaders | > > >>> | `pulsar-all` — core + all connectors + offloaders | *(removed)* | > > >>> > > >>> ### Artifacts > > >>> > > >>> - All connector NARs move from the main Pulsar release to a single > > >>> unified release from > > >>> the `pulsar-connectors` repository > > >>> - All other Pulsar artifacts remain unchanged > > >>> > > >>> ### Configuration > > >>> > > >>> No changes to broker, client, or functions worker configuration. > > >>> > > >>> # Backward & Forward Compatibility > > >>> > > >>> ## Backward Compatibility > > >>> > > >>> The connector API (`pulsar-io-core`) does not change. Existing > > >>> connector NARs continue > > >>> to work with the functions worker without modification. > > >>> > > >>> The `pulsar-io-core` API has been very stable for years with no > > >>> breaking changes, so connectors > > >>> built against older API versions will continue to work with newer > > >>> Pulsar releases and vice versa. > > >>> > > >>> ## Forward Compatibility > > >>> > > >>> New connector releases can target older Pulsar versions, as long as > > >>> the `pulsar-io-core` > > >>> API they depend on is compatible. Given the long track record of API > > >>> stability, this is > > >>> expected to work seamlessly across Pulsar 4.x releases. > > >>> > > >>> # Security Considerations > > >>> > > >>> No security implications. Connectors continue to be loaded through the > > >>> same NAR classloader > > >>> isolation mechanism. The split does not change the security model. > > >>> > > >>> Separating connector dependencies from the main repository actually > > >>> improves security posture > > >>> by reducing the attack surface of the core Pulsar build and making > > >>> connector dependency > > >>> updates independently releasable. > > >>> > > >>> > > >>> > > >>> -- > > >>> Matteo Merli > > >>> <[email protected]> > > >> > > >> > > > >
