[jira] [Created] (CASSSIDECAR-462) Split RestoreJobDiscoverer into a fast status-check loop and a slow slice-discovery loop

Mansi Khara (Jira) Fri, 15 May 2026 13:52:00 -0700

Mansi Khara created CASSSIDECAR-462:
---------------------------------------


             Summary: Split RestoreJobDiscoverer into a fast status-check loop 
and a slow slice-discovery loop
                 Key: CASSSIDECAR-462
                 URL: https://issues.apache.org/jira/browse/CASSSIDECAR-462
             Project: Sidecar for Apache Cassandra
          Issue Type: Improvement
            Reporter: Mansi Khara
            Assignee: Mansi Khara


RestoreJobDiscoverer conflates two operations with very different costs in a 
single 5-minute loop: a cheap point-read status check on known in-flight job 
IDs, and an expensive full scan of restore_ranges with per-range DB writes and 
work-queue submissions. All Sidecar instances other than the one that received 
a phase signal can therefore wait up to 5 minutes before detecting a 
transition. Split the discoverer into a fast loop (~1 second) that reads only 
job.status for known in-flight jobs and reacts immediately on any transition, 
and a slow loop (existing 5-minute interval) that handles full slice discovery, 
restarts, missed signals, and newly created jobs. Add a 
jobDiscoveryStatusCheckInterval configuration key (default: 1 second) to 
RestoreJobConfigurationImpl. The slow loop remains the correctness and recovery 
guarantee, unchanged.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (CASSSIDECAR-462) Split RestoreJobDiscoverer into a fast status-check loop and a slow slice-discovery loop

Reply via email to