[
https://issues.apache.org/jira/browse/CASSANDRA-17180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529711#comment-17529711
]
Paulo Motta commented on CASSANDRA-17180:
-----------------------------------------
Added a few minor nit comments on the PR.
While testing this locally, I noticed that the heartbeat timestamp was being
written as:
{noformat}
{"last_heartbeat":1651195035.638000000}%
{noformat}
This problem also happens with the snapshot manifest. I created [this
commit|https://github.com/apache/cassandra/commit/f5a4e7345501c47593d723f0224c28e26ebe7b64]
to fix this and unify json parsing on {{{}FBUtilities{}}}. Can you check this
and incorporate if it looks good?
After fixing this the heartbeat is written as which is iso8601:
{noformat}
{"last_heartbeat":"2022-04-29T01:25:22.906Z"}%
{noformat}
Got this output when testing manually, so working as expected :
{noformat}
ERROR [main] 2022-04-28 22:18:47,017 CassandraDaemon.java:896 - There are
tables for which gcGraceSeconds is older then the lastly known time Cassandra
node was up based on its heartbeat
/user/.ccm/test/node1/data0/cassandra-heartbeat with timestamp
2022-04-29T01:17:15.638Z. Cassandra node will not start as it would likely
introduce data consistency issues (zombies etc). Please resolve these issues
manually, then remove the heartbeat and start the node again. Invalid tables:
ks.indexed_table
{noformat}
Feel free to commit this after addressing outstanding comments + green CI.
> Implement startup check to prevent Cassandra start to spread zombie data
> ------------------------------------------------------------------------
>
> Key: CASSANDRA-17180
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17180
> Project: Cassandra
> Issue Type: New Feature
> Components: Legacy/Observability
> Reporter: Stefan Miklosovic
> Assignee: Stefan Miklosovic
> Priority: Normal
> Fix For: 4.1
>
> Time Spent: 12h
> Remaining Estimate: 0h
>
> As already discussed on ML, it would be nice to have a service which would
> periodically write timestamp to a file signalling it is up / running.
> Then, on the startup, we would read this file and we would determine if there
> is some table which gc grace is behind this time and we would fail the start
> so we would prevent zombie data to be likely spread around a cluster.
> https://lists.apache.org/thread/w4w5t2hlcrvqhgdwww61hgg58qz13glw
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]