chihsuan opened a new pull request, #10520:
URL: https://github.com/apache/ozone/pull/10520

   ## What changes were proposed in this pull request?
   
   After HDDS-15269 removed the 30s `stop()` hangs, 
`TestReconTaskControllerImpl` was down to ~26s, with the remaining time spent 
in tests that wait out real wall-clock time on the reinitialization retry-delay 
gate (`RETRY_DELAY_MS = 2000`). Two tests dominated:
   
   - `testNewRetryLogicWithMaxRetriesExceeded` (~13.7s): 6x `Thread.sleep(2100)`
   - `testProcessReInitializationEventWithTaskFailuresAndRetry` (~3.4s): 1x 
`Thread.sleep(2100)`
   
   The gate in `validateRetryCountAndDelay()` mixed a wall-clock time check 
with retry counting, so the tests had to sleep to advance past it. This PR 
makes the time source injectable instead of adding a config key (the delay is 
an internal sub-throttle inside an already-scheduled sync loop, so it has 
little independent operational value):
   
   - `ReconTaskControllerImpl` gets a `LongSupplier timeSource` field 
defaulting to `System::currentTimeMillis`, set via a new package-private 
`@VisibleForTesting` constructor that delegates to the existing `@Inject` 
constructor. The `@Inject` path is unchanged, so production behavior is 
identical.
   - The two sites that touch the clock (the gate read in 
`validateRetryCountAndDelay()` and the timestamp write in 
`handleEventFailure()`) now read from `timeSource`, so one seam covers both.
   - The two tests inject an `AtomicLong`-backed virtual clock and advance it 
(`testClock.addAndGet(2100)`) instead of sleeping. They still drive the full 
`queueReInitializationEvent()` path, so the `MAX_RETRIES_EXCEEDED` / 
`verify(..., times(6)).createOMCheckpoint(any())` integration coverage is 
preserved.
   
   Note: the first item in the Jira (replacing `testFailedTaskRetryLogic`'s 
sleep with `waitFor`) was already addressed by HDDS-15449, so it is not part of 
this PR.
   
   The remaining ~20s is a per-test DB + controller setup floor (~1.1s x ~16 
tests), which is a separate concern outside this issue's scope; this PR does 
not target the Jira's "a few seconds" estimate beyond removing the wall-clock 
retry waits.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-15448
   
   ## How was this patch tested?
   
   Ran the affected class locally (`mvn -pl :ozone-recon test 
-Dtest=TestReconTaskControllerImpl`):
   
   - Class total: ~35s -> ~20.6s
   - `testNewRetryLogicWithMaxRetriesExceeded`: ~13.7s -> ~1.1s
   - `testProcessReInitializationEventWithTaskFailuresAndRetry`: ~3.4s -> ~1.35s
   - `Tests run: 18, Failures: 0, Errors: 0, Skipped: 1`
   
   `checkstyle.sh`, `rat.sh`, and `author.sh` pass. No production behavior 
change (the `@Inject` constructor path and the default 
`System::currentTimeMillis` clock are unchanged).
   
   Generated-by: Claude Code (Opus 4.8)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to