[ 
https://issues.apache.org/jira/browse/CASSANDRA-17180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525057#comment-17525057
 ] 

Stefan Miklosovic edited comment on CASSANDRA-17180 at 4/20/22 3:05 PM:
------------------------------------------------------------------------

Changes from the last review:

1) added prefix "check_" to checks in the configuration
2) updated documentation in cassandra.yaml
3) fixed retrieval of table metadata
4) ignoring tables which have gc = 0 (system_traces have conveniently gc = 0 so 
ignoring 0 in general will ignore this too).
5) moved heartbeating scheduling after checks are done 
6) renamed default name of heartbeat file from ".cassandra-heartbeat" to 
"cassandra-heartbeat" (it is not hidden anymore).

I did no modification when calling of SchemaKeyspace#fetchNonSystemKeyspaces 
expect making it public. This method will not return "system" and 
"system_schema" keyspaces. The logic will filter "system_traces".

On the other hand, it will check tables in "system_distributed" as well as 
"system_auth". These tables do not have gc = 0 and they are not excluded from 
fetchNonSystemKeyspaces call.

I am not operationally strong enough to evaluate if we should exclude other 
system keyspaces I have mentioned above.


was (Author: smiklosovic):
Changes from the last review:

1) added prefix "check_" to checks in the configuration
2) updated documentation in cassandra.yaml
3) fixed retrieval of table metadata
4) ignoring tables which have gc = 0
5) moved heartbeating scheduling after checks are done 
6) renamed default name of heartbeat file from ".cassandra-heartbeat" to 
"cassandra-heartbeat" (it is not hidden anymore).

> Implement startup check to prevent Cassandra start to spread zombie data
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-17180
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17180
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Legacy/Observability
>            Reporter: Stefan Miklosovic
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>          Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> As already discussed on ML, it would be nice to have a service which would 
> periodically write timestamp to a file signalling it is up / running.
> Then, on the startup, we would read this file and we would determine if there 
> is some table which gc grace is behind this time and we would fail the start 
> so we would prevent zombie data to be likely spread around a cluster.
> https://lists.apache.org/thread/w4w5t2hlcrvqhgdwww61hgg58qz13glw



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to