[jira] [Commented] (CASSANDRA-17180) Implement startup check to prevent Cassandra start to spread zombie data

Paulo Motta (Jira) Thu, 28 Apr 2022 18:44:11 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-17180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529711#comment-17529711
 ]


Paulo Motta commented on CASSANDRA-17180:
-----------------------------------------

Added a few minor nit comments on the PR.

While testing this locally, I noticed that the heartbeat timestamp was being 
written as:
{noformat}
{"last_heartbeat":1651195035.638000000}%
{noformat}
This problem also happens with the snapshot manifest. I created [this 
commit|https://github.com/apache/cassandra/commit/f5a4e7345501c47593d723f0224c28e26ebe7b64]
 to fix this and unify json parsing on {{{}FBUtilities{}}}. Can you check this 
and incorporate if it looks good?

After fixing this the heartbeat is written as which is iso8601:
{noformat}
{"last_heartbeat":"2022-04-29T01:25:22.906Z"}%
{noformat}
Got this output when testing manually, so working as expected :
{noformat}
ERROR [main] 2022-04-28 22:18:47,017 CassandraDaemon.java:896 - There are 
tables for which gcGraceSeconds is older then the lastly known time Cassandra 
node was up based on its heartbeat 
/user/.ccm/test/node1/data0/cassandra-heartbeat with timestamp 
2022-04-29T01:17:15.638Z. Cassandra node will not start as it would likely 
introduce data consistency issues (zombies etc). Please resolve these issues 
manually, then remove the heartbeat and start the node again. Invalid tables: 
ks.indexed_table
{noformat}
Feel free to commit this after addressing outstanding comments + green CI.

> Implement startup check to prevent Cassandra start to spread zombie data
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-17180
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17180
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Legacy/Observability
>            Reporter: Stefan Miklosovic
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>             Fix For: 4.1
>
>          Time Spent: 12h
>  Remaining Estimate: 0h
>
> As already discussed on ML, it would be nice to have a service which would 
> periodically write timestamp to a file signalling it is up / running.
> Then, on the startup, we would read this file and we would determine if there 
> is some table which gc grace is behind this time and we would fail the start 
> so we would prevent zombie data to be likely spread around a cluster.
> https://lists.apache.org/thread/w4w5t2hlcrvqhgdwww61hgg58qz13glw



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-17180) Implement startup check to prevent Cassandra start to spread zombie data

Reply via email to