Mansi Khara created CASSSIDECAR-462:
---------------------------------------
Summary: Split RestoreJobDiscoverer into a fast status-check loop
and a slow slice-discovery loop
Key: CASSSIDECAR-462
URL: https://issues.apache.org/jira/browse/CASSSIDECAR-462
Project: Sidecar for Apache Cassandra
Issue Type: Improvement
Reporter: Mansi Khara
Assignee: Mansi Khara
RestoreJobDiscoverer conflates two operations with very different costs in a
single 5-minute loop: a cheap point-read status check on known in-flight job
IDs, and an expensive full scan of restore_ranges with per-range DB writes and
work-queue submissions. All Sidecar instances other than the one that received
a phase signal can therefore wait up to 5 minutes before detecting a
transition. Split the discoverer into a fast loop (~1 second) that reads only
job.status for known in-flight jobs and reacts immediately on any transition,
and a slow loop (existing 5-minute interval) that handles full slice discovery,
restarts, missed signals, and newly created jobs. Add a
jobDiscoveryStatusCheckInterval configuration key (default: 1 second) to
RestoreJobConfigurationImpl. The slow loop remains the correctness and recovery
guarantee, unchanged.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]