vinkal-chudgar opened a new pull request, #24859:
URL: https://github.com/apache/pulsar/pull/24859
### Motivation
Fixes #24693.
ResourceGroupService unconditionally schedules two periodic tasks
(aggregating local usage and calculating quotas) during broker initialization,
executing them every resourceUsageTransportPublishIntervalInSecs (default 60
seconds) regardless of whether any tenants or namespaces are registered to
resource groups. This wastes resources on brokers where the resource group
feature is unused.
This PR implements explicit lifecycle management: periodic tasks now start
only when the first tenant or namespace is registered to any resource group,
and stop when the last registration is removed. This eliminates unnecessary
periodic task execution and resource consumption on brokers where the resource
group feature is idle.
### Modifications
- Lazy scheduling: Periodic tasks (aggregateResourceGroupLocalUsages,
calculateQuotaForAllResourceGroups) are no longer unconditionally started in
initialize(). They start only when tenant or namespace registrations are added
(via registerTenant/registerNameSpace calls) and stop when the last
registration is removed.
- Automatic cleanup: Tasks stop automatically when the last attachment is
removed, eliminating background work when resource groups become unused.
- CAS-guarded concurrency: Task lifecycle is controlled via an AtomicBoolean
with compare-and-set to avoid double start/stop and blocking.
- Dynamic config: Existing dynamic rescheduling based on
resourceUsageTransportPublishIntervalInSecs is preserved; guards prevent
reschedule attempts when schedulers are stopped.
- Docs: Updated the ServiceConfiguration field doc to reflect “run only when
in use” and to clarify behavior when a ResourceUsageTransportManager is
configured.
**Modified files**
- **ResourceGroupService.java**: add maybeStartSchedulers(),
maybeStopSchedulersIfIdle(), hasActiveAttachments(), plus small guard/logging
improvements.
- **ServiceConfiguration.java:** clarify
resourceUsageTransportPublishIntervalInSecs doc.
- **ResourceGroupServiceTest.java:** add new tests and adapt existing tests
for lazy lifecycle.
### Verifying this change
- [ ] Make sure that the change passes the CI checks.
This change is verified by unit tests. Tests are sleep free, run with per
test isolation, and use 60s timeouts.
**New unit tests (ResourceGroupServiceTest.java)**
- testLazyStartStopAndReschedule() verifies lazy start, deterministic
reschedule on interval change, and stop on last detach.
- testNoStartOnRGCreateOnly() ensures no schedulers start when a resource
group has no attachments and direct calls do not auto start.
- testStartOnTenantAttachment() confirms that a tenant only attachment
starts both schedulers and detaching the last tenant stops them.
- testStopOnLastDetachWithMixedRefs() checks that schedulers keep running
while any attachment exists and stop only after the last is removed.
- testNoRescheduleWhenStopped() ensures interval changes do not schedule
tasks when the service is idle.
**Updated unit tests (ResourceGroupServiceTest.java)**
`testClose()` :
What changed: Test now creates ResourceGroup and registers an attachment
before calling close(), then verifies tasks were cancelled and references
nulled.
**Original test behavior:**
```
ResourceGroupService service = new ResourceGroupService(pulsar, ...);
service.close();
Assert.assertTrue(service.getAggregateLocalUsagePeriodicTask().isCancelled());
// NPE!
```
**Why the old version failed:**
With lazy initialization, periodic tasks are null until the first attachment
is registered. The original test created a service without attachments and
immediately closed it, so getAggregateLocalUsagePeriodicTask() returned null,
causing a NullPointerException when calling .isCancelled().
**Updated test behavior:**
```
ResourceGroupService service = createTestResourceGroupService();
// Create RG and register attachment to trigger task initialization
service.resourceGroupCreate("testRG", rgConfig);
service.registerNameSpace("testRG", ns);
// Verify tasks are running
Assert.assertNotNull(service.getAggregateLocalUsagePeriodicTask());
Assert.assertTrue(service.isSchedulersRunning());
// Close and verify cleanup
service.close();
Assert.assertNull(service.getAggregateLocalUsagePeriodicTask()); // Now
correctly null
Assert.assertFalse(service.isSchedulersRunning());
```
**Rationale:** The updated test validates the complete lifecycle: initialize
(lazy), activate (first attachment), close (cleanup). This is more
comprehensive than the original test and accurately reflects production usage
patterns.
`testResourceGroupOps()` :
**What changed:** The test now calls `calculateQuotaForAllResourceGroups()`
before unregistering all attachments, rather than after.
**Why this change was necessary:**
The original test sequence was:
1. Register namespaces to ResourceGroup
2. Unregister all namespaces (ResourceGroup now has zero attachments)
3. Call `calculateQuotaForAllResourceGroups()`
4. Assert quota calculations occurred
This worked in the original implementation because periodic tasks ran
continuously regardless of registration state.
With this PR's lifecycle management:
- Tasks start when first registration is added
- Tasks **stop** when last registration is removed
- Both periodic task methods include early-exit guards:
```
if (!schedulersRunning.get() || resourceGroupsMap.isEmpty()) {
return; // Skip work when no active registrations
}
```
**Why the guard is correct:**
1. **Optimization goal**: The entire purpose of this PR (per issue #24693)
is to eliminate unnecessary work when ResourceGroups are unused.
2. **Production alignment**: In production, ResourceGroups without
registered tenants/namespaces have no topics to monitor and no quotas to
calculate.
3. **Test validity**: The updated test now calculates quotas **while
attachments exist**, which accurately reflects the production workflow where
quota calculations occur during active usage, not after all resources are
detached.
**Test coverage maintained:** The test still validates quota calculation
logic; it simply exercises it under realistic conditions (with active
attachments) rather than relying on incidental behavior (running calculations
on empty/stopped service).
**Test utilities**
• isSchedulersRunning() accessor annotated with
com.google.common.annotations.VisibleForTesting.
• createResourceGroupService() helper that returns a fresh
ResourceGroupService(pulsar) per test.
### Personal CI Results
Tested in Personal CI fork: [link to your fork PR]
**Status:**
- 43 checks passed (build, tests, validation)
- 3 coverage upload jobs failed due to missing Codecov token (expected in
forks)
- All core test suites passed successfully
**Evidence:** [](https://github.com/vinkal-chudgar/pulsar/pull/1/checks)
### Does this pull request potentially affect one of the following parts:
<!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
*If the box was checked, please highlight the changes*
- [ ] Dependencies (add or upgrade a dependency)
- [ ] The public API
- [ ] The schema
- [ ] The default values of configurations
- [ ] The threading model
- [ ] The binary protocol
- [ ] The REST endpoints
- [ ] The admin CLI options
- [ ] The metrics
- [ ] Anything that affects deployment
### Documentation
<!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
- [ ] `doc` <!-- Your PR contains doc changes. -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update
later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->
### Matching PR in forked repository
PR in forked repository: https://github.com/vinkal-chudgar/pulsar/pull/1
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]