[ 
https://issues.apache.org/jira/browse/CASSSIDECAR-266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18007335#comment-18007335
 ] 

Andres Beck-Ruiz commented on CASSSIDECAR-266:
----------------------------------------------

Agreed that failure recovery if the start or stop operation fails is complex 
and out of scope for this first iteration. Instead, on submission of a {{PUT 
/api/v1/cassandra/lifecycle}} request, the {{IntentManager}} will return a 
successful response once it has submitted an asynchronous task to the 
{{LifecycleProvider.}} 

Here are some additional details we discussed for the API:

1. Initial state before {{PUT /api/v1/cassandra/lifecycle}} is called will be 
based on {{LifecycleProvider.isRunning().}} For example, if a local Cassandra 
instance is up when Sidecar is initialized, GET{{ /api/v1/cassandra/lifecycle}} 
would return the following:


{    "state": "up",    "intent": null,    "result": null,    "message": null}

2. After _{{PUT /api/v1/cassandra/lifecycle}}_ {{{state: <up/down>}}} is 
submitted, the following would be returned from {{GET 
/api/v1/cassandra/lifecycle}} {{.}} 
{    "state": "up",    "intent": "down",    "result": "in_progress",    
"message": null}

3. If a _{{PUT /api/v1/cassandra/lifecycle}}_ is submitted while a current 
lifecycle operation is ongoing for a given Cassandra node (i.e when 
{{{}"result": "in_progress"{}}}) , we will reject the request and return 
{{{}409 Conflict{}}}. 

4. Based on the result of the future submitted to {{{}LifecycleProvider{}}}, 
the following would be returned from {{GET /api/v1/cassandra/lifecycle}} for a 
successful request:
{    "state": "down",    "intent": "down",    "result": "success",    
"message": "Node has stopped"}
In the case of a failure:
{    "state": "up",    "intent": "down",    "result": "failed",    "message": 
{color:#009100}"Unsafe to stop currently because there are not enough replicas 
available"{color}
}
In order to be consistent, we think that intent should be returned both in the 
case of a successful or failed operation. 

Let me know what you think.

> Implement lifecycle APIs for safely stopping, starting, and restarting local 
> Cassandra instances 
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSSIDECAR-266
>                 URL: https://issues.apache.org/jira/browse/CASSSIDECAR-266
>             Project: Sidecar for Apache Cassandra
>          Issue Type: New Feature
>          Components: Rest API
>            Reporter: Andres Beck-Ruiz
>            Assignee: Andres Beck-Ruiz
>            Priority: Normal
>
> We would like to implement APIs to safely stop, start, and restart local 
> connected Cassandra instances through Cassandra Sidecar in a generic way. 
> This could lead to future work to implement Cassandra native rolling restarts 
> in Sidecar and automate the Cassandra upgrade process. 
> We propose implementing an {{AbstractLifecycleOperationsHandler}} interface 
> that defines start, stop, restart, and status endpoints to allow Sidecar 
> operators to implement their own lifecycle handlers, depending on how they 
> host their Cassandra processes. To provide a default implementation, we would 
> create a {{LocalProcessLifecycleOperationsHandler}} to implement this 
> interface and provide lifecycle operations for OS native Cassandra processes. 
> This could be defined as the default lifecycle manager in 
> {{{}sidecar.yaml{}}}, disabled by default.
> We propose the following APIs, leveraging the {{OperationalJob}} interface to 
> provide support for async non-blocking jobs. We will use the existing 
> implemented {{OperationalJobRoute}} , 
> {{/api/v1/cassandra/operational-jobs/:operationId}} , to track the status of 
> these jobs. These endpoints will live under a {{/node}} path to specify 
> operations on the local connected Cassandra instance, allowing for future 
> development of lifecycle endpoints for an entire Cassandra cluster:
> h5. *GET /api/v1/cassandra/operations/lifecycle/node/status*
> Gets the status of whether the local Cassandra process is running. 
> h6. Response
>  * 200 Ok
>  ** {{cassandra_running :: bool}}
>  * 500 Internal Sever Error
>  ** {{error :: string}}
> h5. *POST /api/v1/cassandra/operations/lifecycle/node/start*
> Start the connected Cassandra process. This request will succeed if the 
> process is already started to ensure idempotency.
> h6. Parameters
>  * {{block :: boolean (default False)}}
> h6. Response
>  * 202 Accepted
>  ** {{operationId :: string}}
>  * 500 Internal Sever Error
>  ** {{error :: string}}
> h5. *POST /api/v1/cassandra/operations/lifecycle/node/stop*
> Stop the connected Cassandra process after a pluggable health check passes. 
> This request will succeed if the process is already stopped to ensure 
> idempotency. 
> h6. Parameters
>  * {{block :: boolean (default False)}}
>  * {{skipHealthCheck :: boolean (default False)}}
> h6. Response
>  * 202 Accepted
>  ** {{operationId :: string}}
>  * 412 Precondition Failed
>  ** {{error :: string (health check fails)}}
>  * 500 Internal Sever Error
>  ** {{error :: string}}
> h5. *POST /api/v1/cassandra/operations/lifecycle/node/restart*
> Restart the connected Cassandra process after a pluggable health check 
> passes. 
> h6. Parameters
>  * {{block :: boolean (default False)}}
>  * {{skipHealthCheck :: boolean (default False)}}
> h6. Response
>  * 202 Accepted
>  ** {{operationId :: string}}
>  * 412 Precondition Failed
>  ** {{error :: string (health check fails)}}
>  * 500 Internal Sever Error
>  ** {{error :: string}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to