wolfstudy commented on PR #25774: URL: https://github.com/apache/pulsar/pull/25774#issuecomment-4466761361
> Thanks @wolfstudy for the proposal. > > **My main concern is architectural rather than technical: I'd like this to build on the storage-layer pluggability introduced by [PIP-384](https://github.com/apache/pulsar/blob/master/pip/pip-384.md) rather than adding this complexity to core Pulsar.** > > PIP-384 deliberately decoupled `ManagedLedgerStorage` and introduced `ManagedLedgerStorageClass` so that the BookKeeper client is owned by a storage class, and a broker can hold multiple storage classes — each topic resolving to one of them via persistence policy. The framing PIP-384 establishes is "this topic lives on this storage backend." The core essence of PIP-477 — moving writes to a new BK cluster while preserving readability of historical data on the old one — fits naturally as an extension of that framing: **a topic can move between storage classes, with per-ledger attribution determining which storage class serves each historical read**. > > Concretely, I think the path forward looks like this: > > 1. **Implement PIP-477 first as a custom set of implementations of the existing interfaces, outside of core Pulsar.** A custom `ManagedLedgerStorage` can hold a `Map<clusterName, BookkeeperManagedLedgerStorageClass>` and expose the active one — this covers what `DynamicBookKeeperClientFactory` does. The orchestrator state machine, the Broker-ZK registry under `/admin/bookie-clusters/*`, the watcher, and the CLI/REST surface can live in the plugin module. The REST endpoints under `/admin/v2/bookie-clusters/*` can be exposed via [`AdditionalServlet`](https://github.com/apache/pulsar/blob/master/pulsar-broker-common/src/main/java/org/apache/pulsar/broker/web/plugin/servlet/AdditionalServlet.java) without touching core REST classes. > 2. **The per-ledger attribution slot already exists in core.** [PIP-404](https://github.com/apache/pulsar/blob/master/pip/pip-404.md) added `repeated KeyValue properties = 6` to `LedgerInfo` and exposed `ManagedLedger#asyncAddLedgerProperty` / `asyncRemoveLedgerProperty` / `asyncGetLedgerProperty` precisely as a generic extension slot for plugins. The `★ new` `repeated KeyValue properties = 7` proposed in §"proto Changes" of PIP-477 is duplicating what's already there — `bookieClusterName` can be encoded as a property in the existing `LedgerInfo.properties` slot today, with no core proto change. `ManagedCursorInfo` similarly already has `cursorProperties` (a `repeated StringProperty` at tag 8) which can carry the same attribution. > 3. **The CLI surface is also already pluggable.** [PIP-201](https://github.com/apache/pulsar/blob/master/pip/pip-201.md) added the `CustomCommandFactory` / `CustomCommandGroup` / `CustomCommand` SPI (in `pulsar-client-tools-api`), bundled as a `.nar`, so `pulsar-admin bookie-clusters …` can be shipped entirely in the plugin with no `pulsar-admin` core change. > 4. **Where there are real SPI gaps, add narrow hooks — don't inline the feature.** A custom `ManagedLedgerFactory` returned by the storage class already gets you most of the way; the remainder should be small, focused additions, narrower than PIP-477 and useful to anyone writing a custom storage class. > 5. **If the plugin proves valuable to other users of Pulsar later, it can be moved into the Pulsar tree** as a separate module shipped alongside the broker, opt-in via configuration. The ~99% of users who never use this feature pay zero cost in core surface area, the orchestrator code can evolve on its own cadence. > > This approach also provides a forcing function for a cleaner abstraction. PIP-384 currently assumes "one topic, one storage class for its lifetime." Generalizing to "one topic, one storage class at a time, with attribution-driven routing for previously-written ledgers" is a natural and reusable extension — useful well beyond the BK-cluster-switching scenario (e.g., per-topic storage-tier migration, gradual rollout of an alternative `ManagedLedgerStorage` implementation). > > Would you be open to reframing the PIP along these lines? > > Happy to discuss further on the dev list or here. Thanks for the thorough review. I agree to reframe PIP-470 along the PIP-384 extension axis: - **Drop the new proto fields**. Encode bookieClusterName into LedgerInfo.properties (already added by PIP-404) and ManagedCursorInfo.cursorProperties (already exists). The remaining open question is SchemaStorageFormat.PositionInfo, where no equivalent properties slot exists today — I'll either propose a tiny additive slot or fold attribution into SchemaLocator properties as a follow-up. - **Ship as a plugin NAR module** (pulsar-bookie-cluster-switching) that provides a custom ManagedLedgerStorage holding a Map<clusterName, BookkeeperManagedLedgerStorageClass>. Active cluster selection, ZK registry under /admin/bookie-clusters/*, watcher, and orchestrator (BUILD/PROMOTE/CLEANUP) all live in the plugin. - **REST surface** moves to AdditionalServlet; CLI moves to PIP-201 CustomCommandFactory NAR. - **Narrow core hooks** I still need (these would be useful to any custom storage class, not just BK switching): - ManagedLedgerConfig.activeBookKeeperSupplier — per-call active BK resolution on the write path. - BookkeeperSchemaStorage.schemaReadBookKeeperResolver — analogous read-path resolution for schema ledgers. - An owner-broker-callable hook for cursor CAS under the ManagedCursorImpl lock (or expose it as an internal endpoint via AdditionalServlet). - **Opt-in via configuration**; the ~99% of users that don't switch BK clusters pay zero cost in core. I'll update the PIP draft accordingly and post the revised version on the dev list. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
