[ 
https://issues.apache.org/jira/browse/CASSANDRA-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16094112#comment-16094112
 ] 

Kurt Greaves commented on CASSANDRA-13561:
------------------------------------------

I disregarded HH because HH cannot be relied on to provide consistency 
guarantees. The scenario is still bad if either the node is down for longer 
than HH, or if HH fails for some reason.

Also note that GCGS=0 disables hinted handoff for the table (at least last time 
I checked this hadn't changed). So it's not exactly the same.

I see the point that this could be useful for cases where a default TTL is set, 
however even with a default TTL you can still update/remove the TTL of columns. 
This means the risk is only really mitigated where you set a default TTL, and 
you never do anything to alter that TTL. 

The best you can currently in this case is that you could set GCGS=hinted 
handoff window and you don't sacrifice consistency, and you only keep the 
expired cells around for min 3 hours. This case is really perfectly fine when 
you are using TWCS/DTCS, as the SSTables should expire efficiently and it's 
unlikely you'd be querying expired data anyway.
The only case I can think of where you would really get a benefit without any 
risk is when using the same write strategy (default TTL/always TTL) but on a 
table that doesn't work with TWCS/DTCS, so you use STCS/LCS, and you also have 
to keep GCGS high because you also do manual deletes.

Your proposal would make it so that you can remove data a little bit faster if 
it compacts within GCGS. I'm a bit skeptical if that's actually necessary, 
especially with the introduction of {{provide_overlapping_tombstones}} in 3.10, 
which should allow much more efficient removal of tombstones.
Really if you're generating that many tombstones within GCGS+time to compaction 
per partition/PK that it is causing significant latency I'd be very surprised. 
I'd be interested to see some metrics surrounding this, and confirm that other 
options don't perform well enough first. Maybe you could give us an example of 
your use case where this gave the benefits?

I feel that this is a sufficiently dangerous option with hard to understand 
implications that we should need pretty good justification before making it 
readily available for configure on a table level. That just screams 
impending-shoot-yourself-in-the-foot. Maybe we could put some other safety net 
around it (e.g a property passed into C*) that doesn't allow changing it unless 
you start C* with that option set, but yeah, let's figure out some concrete 
benefits first.


> Purge TTL on expiration
> -----------------------
>
>                 Key: CASSANDRA-13561
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13561
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Andrew Whang
>            Assignee: Andrew Whang
>            Priority: Minor
>             Fix For: 4.0
>
>
> Tables with mostly TTL columns tend to suffer from high droppable tombstone 
> ratio, which results in higher read latency, cpu utilization, and disk usage. 
> Expired TTL data become tombstones, and the nature of purging tombstones 
> during compaction (due to checking for overlapping SSTables) make them 
> susceptible to surviving much longer than expected. A table option to purge 
> TTL on expiration would address this issue, by preventing them from becoming 
> tombstones. A boolean purge_ttl_on_expiration table setting would allow users 
> to easily turn the feature on or off. 
> Being more aggressive with gc_grace could also address the problem of long 
> lasting tombstones, but that would affect tombstones from deletes as well. 
> Even if a purged [expired] cell is revived via repair from a node that hasn't 
> yet compacted away the cell, it would be revived as an expiring cell with the 
> same localDeletionTime, so reads should properly handle them. As well, it 
> would be purged in the next compaction. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to