Hi all,

I'd like to start a discussion on PIP-470, which proposes a broker option
to close (unload) inactive topics from broker memory without deleting their
data.

PIP: https://github.com/apache/pulsar/pull/25574/files#diff-pip-470
Prototype PR: https://github.com/apache/pulsar/pull/25574
Problem

Deployments with very large numbers of mostly-idle topics (commonly tens of
thousands to millions, with a long tail of low-traffic topics) face two
recurring problems:

   1. Broker memory pressure. Every loaded topic pins a managed ledger with
   its cache, subscription/cursor state, rate limiters, dispatchers, and
   schema references. An idle topic that hasn't been produced/consumed for
   hours still occupies all of that memory.
   2. Metrics cardinality. Per-topic metric series grow linearly with the
   number of loaded topics, inflating scrape payloads and monitoring cost.

Today the only built-in remedy is brokerDeleteInactiveTopicsEnabled, but
that deletes the data — which many operators explicitly do not want. Their
remaining options are:

   - Leave every idle topic loaded and pay the memory/metrics cost, or
   - Run an external cron that polls topics stats and calls pulsar-admin
   topics unload per topic — awkward, reimplements the existing inactivity
   detection, and adds a moving part to the deployment.

Proposal

Add a new dynamic broker configuration:

brokerCloseInactiveTopicsEnabled = false   # default

When enabled, the existing inactivity monitor reuses its current detection
(mode, frequency, max-inactive-duration) but performs a close — the same
code path as pulsar-admin topics unload — instead of a delete. Ledgers in
BookKeeper, subscriptions, cursors, and topic policies are all preserved;
only the in-memory topic and its broker-cache entry are released. The next
produce/consume reconnect transparently reloads the topic.

The new flag is mutually exclusive with brokerDeleteInactiveTopicsEnabled;
broker startup fails fast if both are set.
Design highlights

   - Reuses brokerDeleteInactiveTopicsMode / FrequencySeconds /
   MaxInactiveDurationSeconds for detection — no new detection surface.
   - Wires into the existing PersistentTopic.checkGC() /
   NonPersistentTopic.checkGC() by swapping the terminal action. The
   retention-window guard is bypassed in the close branch because it exists to
   prevent data loss, which is moot when nothing is deleted.
   - No admin-API, wire-protocol, or schema changes.
   - Default is false, so the change is behavior-preserving for existing
   deployments.

Out of scope for v1

   - Per-topic or per-namespace overrides (broker-level only in v1; a
   follow-up can extend InactiveTopicPolicies with an action field if
   operators want per-namespace control).
   - Changes to InactiveTopicDeleteMode or InactiveTopicPolicies schema.
   - A new admin endpoint — manual unload remains available for ad-hoc use.


Regards,
Penghui

Reply via email to