oversearch opened a new issue, #19968: URL: https://github.com/apache/pulsar/issues/19968
### Search before asking - [X] I searched in the [issues](https://github.com/apache/pulsar/issues) and found nothing similar. ### Version 2.11.0 ### Minimal reproduce step Fire up a Pulsar cluster with the "inactive delete" options enabled, and set *everything* to self-destruct after 10 minutes: ``` brokerDeleteInactiveTopicsEnabled=true brokerDeleteInactiveTopicsFrequencySeconds=60 brokerDeleteInactiveTopicsMode=delete_when_no_subscriptions brokerDeleteInactiveTopicsMaxInactiveDurationSeconds=600 subscriptionExpirationTimeMinutes=10 subscriptionExpiryCheckIntervalInMinutes=1 ttlDurationDefaultInSeconds=600 ``` Make a test namespace with the default options. Publish some messages to *many* test topics: e.g. 100+. Restart the broker(s), and *don't* utilize the test namespace at all. Wait 35 minutes List all topics in the namespace / check logs ### What did you expect to see? All topics should be deleted because they weren't being used, the message TTL would have expired, and the subscriptions were inactive. Logs should contain lots of messages of the form `[persistent://tenant/namespace/topic] Topic deleted successfully due to inactivity`. ### What did you see instead? The topics are never cleaned up, and never mentioned in the logs after the broker restart. They are clearly visible in the listing of topics when queried with the pulsar-admin tool. ### Anything else? I believe this is caused because the Broker Service *only* checks for expiration of topics / subscriptions / messages for topics or bundles that it actually has loaded in its local cache. There appear to be several circumstances where a particular set of topics (namespace bundle?) is not owned by *any* broker, and thus nothing related to those topics (messages, subscriptions, or topics themselves) will *ever* be checked for expiration unless more messages are published to a topic in that bundle and it happens to get loaded. I really went down the rabbit hole on this one. Part of the issue is that there isn't much documentation on how these features actually work. I ended up modifying the Pulsar source to output more debugging information before realizing the idle topics I was seeing were never even entering the `checkGC()` function in `PersistentTopic.java` where all of this logic lives. Upon restarting the brokers in my cluster (I've got two brokers in my test cluster), it looks like none of the topic bundles get loaded by default. But it also appears that the broker will occasionally just unload an idle topic bundle. I can see why that might be desirable, but it defeats the inactive entity purging features. For some background: my use case involves a "test" cluster that a huge suite of integration tests publish messages to in a variety of topics and namespaces. There may be many instances of the test suite running concurrently. To make it work cleanly, we detect when our client is running in "integration test mode" and generate a random unique ID for the duration of the process, and append that to the end of every topic name, ensuring each test process is isolated from any others. This will quickly cause hundreds of topics to pile up, and I want Pulsar to clean them up automatically so I don't have to write a service to do it manually. I want them cleaned up unconditionally, because tests often crash and leave junk in a backlog. The inactive topic/subscription cleanup features seem perfect for this - I have all namespaces configured with a 10 minute message TTL, 10 minute subscription inactivity timeout, and 10 minute topic inactivity timeout (delete_when_no_subscriptions). So, I figure within about 30 minutes old topics should start falling off: 10 minutes each for the messages, subscriptions, and finally topics to get cleaned up. Once I got all of this configured correctly, I noticed that I had dozens of old topics and subscriptions sticking around, and the Pulsar logs had no mention of them. If I published messages to all of the topics involved to get them "active" again, they would usually be cleaned up after ~30 minutes - but not always! Sometimes it seems the message TTL isn't being honored and the subscriptions never expire. I haven't tracked that down yet... Anyway, long story short: it would be nice if the broker could be set up to periodically load all namespace bundles and run *all* the inactivity checks if they're enabled. I did try to figure this out myself, but I'm not really a Java guy... I couldn't really see where to start and I admit I fear the unknown implications of making a change like this in such a huge code base. Plus I'm mostly hoping that I'm just doing it wrong and somebody will tell me the special setting I've missed... Thank you. ### Are you willing to submit a PR? - [ ] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
