vinkal-chudgar opened a new pull request, #24859:
URL: https://github.com/apache/pulsar/pull/24859

   ### Motivation
   
   Fixes #24693.
   
   ResourceGroupService unconditionally schedules two periodic tasks 
(aggregating local usage and calculating quotas) during broker initialization, 
executing them every resourceUsageTransportPublishIntervalInSecs (default 60 
seconds) regardless of whether any tenants or namespaces are registered to 
resource groups. This wastes resources on brokers where the resource group 
feature is unused.
   
   This PR implements explicit lifecycle management: periodic tasks now start 
only when the first tenant or namespace is registered to any resource group, 
and stop when the last registration is removed. This eliminates unnecessary 
periodic task execution and resource consumption on brokers where the resource 
group feature is idle. 
   
   ### Modifications
   
   - Lazy scheduling: Periodic tasks (aggregateResourceGroupLocalUsages, 
calculateQuotaForAllResourceGroups) are no longer unconditionally started in 
initialize(). They start only when tenant or namespace registrations are added 
(via registerTenant/registerNameSpace calls) and stop when the last 
registration is removed.
   - Automatic cleanup: Tasks stop automatically when the last attachment is 
removed, eliminating background work when resource groups become unused.
   - CAS-guarded concurrency: Task lifecycle is controlled via an AtomicBoolean 
with compare-and-set to avoid double start/stop and blocking.
   - Dynamic config: Existing dynamic rescheduling based on 
resourceUsageTransportPublishIntervalInSecs is preserved; guards prevent 
reschedule attempts when schedulers are stopped.
   - Docs: Updated the ServiceConfiguration field doc to reflect “run only when 
in use” and to clarify behavior when a ResourceUsageTransportManager is 
configured.
   
   **Modified files**
   
   - **ResourceGroupService.java**: add maybeStartSchedulers(), 
maybeStopSchedulersIfIdle(), hasActiveAttachments(), plus small guard/logging 
improvements.
   - **ServiceConfiguration.java:** clarify 
resourceUsageTransportPublishIntervalInSecs doc.
   - **ResourceGroupServiceTest.java:** add new tests and adapt existing tests 
for lazy lifecycle.
   
   ### Verifying this change
   
   - [ ] Make sure that the change passes the CI checks.
   
   This change is verified by unit tests. Tests are sleep free, run with per 
test isolation, and use 60s timeouts.
   **New unit tests (ResourceGroupServiceTest.java)**
   
   -  testLazyStartStopAndReschedule() verifies lazy start, deterministic 
reschedule on interval change, and stop on last detach.
   -  testNoStartOnRGCreateOnly() ensures no schedulers start when a resource 
group has no attachments and direct calls do not auto start.
   - testStartOnTenantAttachment() confirms that a tenant only attachment 
starts both schedulers and detaching the last tenant stops them.
   - testStopOnLastDetachWithMixedRefs() checks that schedulers keep running 
while any attachment exists and stop only after the last is removed.
   - testNoRescheduleWhenStopped() ensures interval changes do not schedule 
tasks when the service is idle.
   
   **Updated unit tests (ResourceGroupServiceTest.java)**
   `testClose()` : 
   What changed: Test now creates ResourceGroup and registers an attachment 
before calling close(), then verifies tasks were cancelled and references 
nulled.
   
   **Original test behavior:**   
   ```
   ResourceGroupService service = new ResourceGroupService(pulsar, ...);
   service.close();
   
Assert.assertTrue(service.getAggregateLocalUsagePeriodicTask().isCancelled());  
// NPE!
   ```
   **Why the old version failed:**
   With lazy initialization, periodic tasks are null until the first attachment 
is registered. The original test created a service without attachments and 
immediately closed it, so getAggregateLocalUsagePeriodicTask() returned null, 
causing a NullPointerException when calling .isCancelled().
    
   **Updated test behavior:**
   
   ```
   ResourceGroupService service = createTestResourceGroupService();
   // Create RG and register attachment to trigger task initialization
   service.resourceGroupCreate("testRG", rgConfig);
   service.registerNameSpace("testRG", ns);
   
   // Verify tasks are running
   Assert.assertNotNull(service.getAggregateLocalUsagePeriodicTask());
   Assert.assertTrue(service.isSchedulersRunning());
   
   // Close and verify cleanup
   service.close();
   Assert.assertNull(service.getAggregateLocalUsagePeriodicTask());  // Now 
correctly null
   Assert.assertFalse(service.isSchedulersRunning());
   ```
   
   **Rationale:** The updated test validates the complete lifecycle: initialize 
(lazy), activate (first attachment), close (cleanup). This is more 
comprehensive than the original test and accurately reflects production usage 
patterns.
   
    `testResourceGroupOps()` :
   **What changed:** The test now calls `calculateQuotaForAllResourceGroups()` 
before unregistering all attachments, rather than after.
   
   **Why this change was necessary:**
   
   The original test sequence was:
   1. Register namespaces to ResourceGroup
   2. Unregister all namespaces (ResourceGroup now has zero attachments)
   3. Call `calculateQuotaForAllResourceGroups()`
   4. Assert quota calculations occurred
   
   This worked in the original implementation because periodic tasks ran 
continuously regardless of registration state.
   
   With this PR's lifecycle management:
   - Tasks start when first registration is added
   - Tasks **stop** when last registration is removed
   - Both periodic task methods include early-exit guards:
   
   ```
   if (!schedulersRunning.get() || resourceGroupsMap.isEmpty()) {
        return; // Skip work when no active registrations
   }
   ```
   
   **Why the guard is correct:**
   
   1. **Optimization goal**: The entire purpose of this PR (per issue #24693) 
is to eliminate unnecessary work when ResourceGroups are unused.
   
   2. **Production alignment**: In production, ResourceGroups without 
registered tenants/namespaces have no topics to monitor and no quotas to 
calculate.
   
   3. **Test validity**: The updated test now calculates quotas **while 
attachments exist**, which accurately reflects the production workflow where 
quota calculations occur during active usage, not after all resources are 
detached.
   
   **Test coverage maintained:** The test still validates quota calculation 
logic; it simply exercises it under realistic conditions (with active 
attachments) rather than relying on incidental behavior (running calculations 
on empty/stopped service).
   
   **Test utilities**
       • isSchedulersRunning() accessor annotated with 
com.google.common.annotations.VisibleForTesting.
       • createResourceGroupService() helper that returns a fresh 
ResourceGroupService(pulsar) per test.
   
   ### Personal CI Results
   Tested in Personal CI fork: [link to your fork PR]
   
   **Status:**
   - 43 checks passed (build, tests, validation)
   - 3 coverage upload jobs failed due to missing Codecov token (expected in 
forks)
   - All core test suites passed successfully
   **Evidence:** [](https://github.com/vinkal-chudgar/pulsar/pull/1/checks)
   
   ### Does this pull request potentially affect one of the following parts:
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   *If the box was checked, please highlight the changes*
   
   - [ ] Dependencies (add or upgrade a dependency)
   - [ ] The public API
   - [ ] The schema
   - [ ] The default values of configurations
   - [ ] The threading model
   - [ ] The binary protocol
   - [ ] The REST endpoints
   - [ ] The admin CLI options
   - [ ] The metrics
   - [ ] Anything that affects deployment
   
   ### Documentation
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   - [ ] `doc` <!-- Your PR contains doc changes. -->
   - [ ] `doc-required` <!-- Your PR changes impact docs and you will update 
later -->
   - [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
   - [ ] `doc-complete` <!-- Docs have been already added -->
   
   ### Matching PR in forked repository
   
   PR in forked repository: https://github.com/vinkal-chudgar/pulsar/pull/1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to