oversearch opened a new issue, #19968:
URL: https://github.com/apache/pulsar/issues/19968

   ### Search before asking
   
   - [X] I searched in the [issues](https://github.com/apache/pulsar/issues) 
and found nothing similar.
   
   
   ### Version
   
   2.11.0
   
   ### Minimal reproduce step
   
   Fire up a Pulsar cluster with the "inactive delete" options enabled, and set 
*everything* to self-destruct after 10 minutes: 
   ```
   brokerDeleteInactiveTopicsEnabled=true
   brokerDeleteInactiveTopicsFrequencySeconds=60
   brokerDeleteInactiveTopicsMode=delete_when_no_subscriptions
   brokerDeleteInactiveTopicsMaxInactiveDurationSeconds=600
   subscriptionExpirationTimeMinutes=10
   subscriptionExpiryCheckIntervalInMinutes=1
   ttlDurationDefaultInSeconds=600
   ```
   Make a test namespace with the default options.  Publish some messages to 
*many* test topics: e.g. 100+.
   
   Restart the broker(s), and *don't* utilize the test namespace at all.
   Wait 35 minutes
   List all topics in the namespace / check logs
   
   ### What did you expect to see?
   
   All topics should be deleted because they weren't being used, the message 
TTL would have expired, and the subscriptions were inactive.
   Logs should contain lots of messages of the form 
`[persistent://tenant/namespace/topic] Topic deleted successfully due to 
inactivity`.
   
   ### What did you see instead?
   
   The topics are never cleaned up, and never mentioned in the logs after the 
broker restart. They are clearly visible in the listing of topics when queried 
with the pulsar-admin tool.
   
   ### Anything else?
   
   I believe this is caused because the Broker Service *only* checks for 
expiration of topics / subscriptions / messages for topics or bundles that it 
actually has loaded in its local cache.  There appear to be several 
circumstances where a particular set of topics (namespace bundle?) is not owned 
by *any* broker, and thus nothing related to those topics (messages, 
subscriptions, or topics themselves) will *ever* be checked for expiration 
unless more messages are published to a topic in that bundle and it happens to 
get loaded.  
   
   I really went down the rabbit hole on this one.  Part of the issue is that 
there isn't much documentation on how these features actually work.  I ended up 
modifying the Pulsar source to output more debugging information before 
realizing the idle topics I was seeing were never even entering the `checkGC()` 
function in `PersistentTopic.java` where all of this logic lives.  Upon 
restarting the brokers in my cluster (I've got two brokers in my test cluster), 
it looks like none of the topic bundles get loaded by default.  But it also 
appears that the broker will occasionally just unload an idle topic bundle.  I 
can see why that might be desirable, but it defeats the inactive entity purging 
features.
   
   For some background: my use case involves a "test" cluster that a huge suite 
of integration tests publish messages to in a variety of topics and namespaces. 
 There may be many instances of the test suite running concurrently.  To make 
it work cleanly, we detect when our client is running in "integration test 
mode" and generate a random unique ID for the duration of the process, and 
append that to the end of every topic name, ensuring each test process is 
isolated from any others.  This will quickly cause hundreds of topics to pile 
up, and I want Pulsar to clean them up automatically so I don't have to write a 
service to do it manually.  I want them cleaned up unconditionally, because 
tests often crash and leave junk in a backlog.
   
   The inactive topic/subscription cleanup features seem perfect for this - I 
have all namespaces configured with a 10 minute message TTL, 10 minute 
subscription inactivity timeout, and 10 minute topic inactivity timeout 
(delete_when_no_subscriptions).  So, I figure within about 30 minutes old 
topics should start falling off: 10 minutes each for the messages, 
subscriptions, and finally topics to get cleaned up.  Once I got all of this 
configured correctly, I noticed that I had dozens of old topics and 
subscriptions sticking around, and the Pulsar logs had no mention of them.  If 
I published messages to all of the topics involved to get them "active" again, 
they would usually be cleaned up after ~30 minutes - but not always!  Sometimes 
it seems the message TTL isn't being honored and the subscriptions never 
expire.  I haven't tracked that down yet...
   
   Anyway, long story short: it would be nice if the broker could be set up to 
periodically load all namespace bundles and run *all* the inactivity checks if 
they're enabled.  I did try to figure this out myself, but I'm not really a 
Java guy... I couldn't really see where to start and I admit I fear the unknown 
implications of making a change like this in such a huge code base.  Plus I'm 
mostly hoping that I'm just doing it wrong and somebody will tell me the 
special setting I've missed...
   
   Thank you.
   
   ### Are you willing to submit a PR?
   
   - [ ] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to