[
https://issues.apache.org/jira/browse/CASSSIDECAR-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18091068#comment-18091068
]
Saranya Krishnakumar commented on CASSSIDECAR-462:
--------------------------------------------------
+1 Thanks
> Split RestoreJobDiscoverer into a fast status-check loop and a slow
> slice-discovery loop
> ----------------------------------------------------------------------------------------
>
> Key: CASSSIDECAR-462
> URL: https://issues.apache.org/jira/browse/CASSSIDECAR-462
> Project: Sidecar for Apache Cassandra
> Issue Type: Improvement
> Components: Bulk Analytics
> Reporter: Mansi Khara
> Assignee: Mansi Khara
> Priority: Normal
> Time Spent: 3h 20m
> Remaining Estimate: 0h
>
> RestoreJobDiscoverer conflates two operations with very different costs in a
> single 5-minute loop: a cheap point-read status check on known in-flight job
> IDs, and an expensive full scan of restore_ranges with per-range DB writes
> and work-queue submissions. All Sidecar instances other than the one that
> received a phase signal can therefore wait up to 5 minutes before detecting a
> transition. Split the discoverer into a fast loop (~1 second) that reads only
> job.status for known in-flight jobs and reacts immediately on any transition,
> and a slow loop (existing 5-minute interval) that handles full slice
> discovery, restarts, missed signals, and newly created jobs. Add a
> jobDiscoveryStatusCheckInterval configuration key (default: 1 second) to
> RestoreJobConfigurationImpl. The slow loop remains the correctness and
> recovery guarantee, unchanged.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]