[ 
https://issues.apache.org/jira/browse/CASSANDRA-17180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17453037#comment-17453037
 ] 

Paulo Motta commented on CASSANDRA-17180:
-----------------------------------------

My point is that customers should not be aware of the "heartbeat" term since 
this is an implementation detail of the "check_gc_grace_seconds_on_startup" 
feature, which requires a heartbeat file to track the last time the node was 
up. But for instance, a heartbeat file would not be needed if it were not for 
this feature.

So my suggestion is to not expose this feature to users as "heartbeat" to avoid 
leaking implementation details to users. What the user is interested is just 
that the startup fails if the node has been down for longer than 
gc_grace_seconds on any table so that is the feature we should expose to users. 
So I would suggest something along those lines in the current configuration 
format:
{noformat}
check_gc_grace_seconds_on_startup:
    enabled: true
    ignored_tables:
           - ks1.tb1
           - ks2.tb2
           - ks3 // would ignore whole keyspace
   - heartbeat_file: .cassandra-heartbeat //advanced property, maybe can be a 
system property?
   - heartbeat_period: 60 secs //advanced property, maybe can be a system 
property?
{noformat}

Regarding the refactoring of the startup checks this was more a suggestion for 
a future improvement but we shouldn't block this ticket on that, just be aware 
of the future perspective so we can easily transpose the property to the new 
format in the future.

> Implement heartbeat service to know last time Cassandra node was up
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-17180
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17180
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Legacy/Observability
>            Reporter: Stefan Miklosovic
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> As already discussed on ML, it would be nice to have a service which would 
> periodically write timestamp to a file signalling it is up / running.
> Then, on the startup, we would read this file and we would determine if there 
> is some table which gc grace is behind this time and we would fail the start 
> so we would prevent zombie data to be likely spread around a cluster.
> https://lists.apache.org/thread/w4w5t2hlcrvqhgdwww61hgg58qz13glw



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to