[ https://issues.apache.org/jira/browse/CASSSIDECAR-274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18016891#comment-18016891 ]
Andrés Beck-Ruiz commented on CASSSIDECAR-274: ---------------------------------------------- I agree that having a single centralized API for all operations is ideal from a user experience perspective. {quote}Is there a particular benefit (or limitation with the existing framework) to keeping these operation types distinct at the API level that I might be missing? {quote} These are the original limitations I found with the existing framework, where it might have to be enhanced: * The framework does not persist jobs if a Sidecar instance crashes, which CASSSIDECAR-341 would address * Sidecar instances can't query the status of non-local jobs, which again could be addressed by CASSSIDECAR-341 * There isn't an ability to update the state of a currently running job. The ability to pause or abort a restart would be important for ensuring operational safety. * The current [OperationalJobResponse|https://github.com/apache/cassandra-sidecar/blob/trunk/client-common/src/main/java/org/apache/cassandra/sidecar/common/response/OperationalJobResponse.java] is not verbose enough to allow proper visibility into a restart job or other cluster-wide operations. It would be important for an operator to understand which individual nodes have failed/succeeded to restart, for example. I have a draft of a CEP for approaching rolling restarts via Sidecar ready, and it includes a design for durable, cluster-accessible operations that could address CASSSIDECAR-341 and an extensible approach to cluster-wide operations as well. I am planning to open it so that the larger community can give feedback, and am open to further discussion about how this API could be organized and whether we should extend the current job management framework. > Enable rolling restarts of Cassandra clusters via Sidecar > --------------------------------------------------------- > > Key: CASSSIDECAR-274 > URL: https://issues.apache.org/jira/browse/CASSSIDECAR-274 > Project: Sidecar for Apache Cassandra > Issue Type: Improvement > Reporter: Isaac Reath > Priority: Major > Attachments: Screenshot 2025-08-13 at 12.34.43 PM.png > > > Rolling restarts are frequently used in Cassandra to apply changes to a > cluster such as configuration changes, or version upgrades. In > CASSSIDECAR-266, we are adding functionality to safely start and stop a > single Cassandra node via Sidecar. This ticket will build on that work to > implement a coordinated rolling restart. > The scope of this effort includes: > * Adding API endpoints to enable operators to start, monitor, pause and stop > a rolling restart. > * Updating Sidecar to orchestrate start and stop operations across the > cluster, allowing for a configurable amount of nodes to be offline > simultaneously. > * Creating safeguards to ensure that a rolling restart is safe to perform > and does not interfere with other operations ongoing in the cluster such as > node bootstraps or decommissions. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org