lhotari commented on code in PR #25721: URL: https://github.com/apache/pulsar/pull/25721#discussion_r3210295559
########## pip/pip-475.md: ########## @@ -0,0 +1,350 @@ +# PIP-475: Regular-to-Scalable Topic Migration + +*Sub-PIP of [PIP-460: Scalable Topics](pip-460.md)* + +## Motivation + +[PIP-460](pip-460.md) introduces scalable topics (`topic://...`) as a new topic type that supports range splitting and merging without breaking key ordering. For this to be adoptable in real deployments, users with existing partitioned and non-partitioned topics need a migration path that: + +1. **Doesn't require recreating their topics from scratch.** Existing topics may hold months of retained data and have many active subscriptions. Re-create + re-publish is not a viable upgrade story. +2. **Lets clients adopt the V5 SDK before any topic is migrated.** Operationally, applications need to be upgraded one at a time over weeks, while the topics they read and write keep working as-is. The V5 SDK has to interoperate with the *old* topic types until the migration moment. +3. **Keeps the migration moment small and surgical.** Once all clients are on the V5 SDK, an admin command flips a topic from regular to scalable in a single atomic step, without copying data or moving cursors. +4. **Cannot be reversed.** Once a topic is scalable, regressing to a regular topic is unsafe (the new layout can have already split, leaving data in segments that don't map back to a fixed partition count). The metadata transition has to be one-way. + +PIP-460 lists "Tooling for migrating existing partitioned topics to scalable topics" in its postponed section. This PIP closes that gap. + +This PIP also clarifies the V5 SDK's behavior when given a topic name that may or may not be scalable, and tightens the broker so that a v4 client cannot accidentally write to (or auto-create) a regular topic that has already been migrated. + +The longer-term direction for Pulsar is for scalable topics to **fully replace** partitioned and non-partitioned topics: the existing topic types stay supported for backward compatibility, but new development on the topic surface targets scalable topics, and migration tooling like this PIP is what lets existing deployments make that transition incrementally instead of all at once. + +--- + +## Background Knowledge + +### Topic domains in Pulsar today + +A Pulsar topic name encodes its domain in a URI scheme: + +- `persistent://t/n/x` — durable topic backed by a managed ledger. +- `non-persistent://t/n/x` — in-memory topic, no durability. +- `topic://t/n/x` — scalable topic introduced by PIP-460. Backed by a DAG of segments; each segment is itself a `segment://...` topic with its own managed ledger. + Review Comment: One detail that we missed in Pulsar 4.2.0 is the migration from v1 topics to v2 topics. Since users might be upgrading directly from 4.0.x to 5.0.x, I'd assume that v1 topics would need to be handled in some way. znodes are different for v2 and v1 topics: Managed ledger • v2: /managed-ledgers/tenant/ns/persistent/topic • v1: /managed-ledgers/tenant/cluster/ns/persistent/topic Partitioned topic metadata • v2: /admin/partitioned-topics/tenant/ns/persistent/topic • v1: /admin/partitioned-topics/tenant/cluster/ns/persistent/topic Namespace policies • v2: /admin/policies/tenant/ns • v1: /admin/policies/tenant/cluster/ns A common reason why v1 topics exist in 4.0.x production deployments is that adding a slash to a topic name makes it silently a v1 topic. In 4.1.0, a configuration setting allowAutoTopicCreationWithLegacyNamingScheme was added to prevent creating v1 topics accidentially: https://github.com/apache/pulsar/pull/23620 How are we going to address the possible existence of v1 topics in 4.0.x -> 5.0.x migration? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
